Tools/Computer Vision & Object Detection/Florence-2

Florence-2

Unified vision foundation model by Microsoft for captioning, detection, and segmentation.

Open SourceSelf HostedOffline CapableGPU Required (6GB+ VRAM)

0.0 (0)

Visit Website View on GitHub

About

Florence-2 by Microsoft is a compact vision foundation model that handles many vision and vision-language tasks through one prompt-based sequence-to-sequence interface. With simple text prompts it performs captioning, object detection, grounding, OCR, and segmentation, and a Transformers implementation is published on Hugging Face. Despite its small size it covers a broad task range. Released under the MIT license.

Reviews (0)

Leave a Review

No reviews yet. Be the first to review!

Details

Category: Computer Vision & Object Detection
Price: Free
Platform: Local/Desktop
Difficulty: Intermediate (3/5)
License: MIT
Minimum VRAM: 6 GB
Added: Apr 3, 2026

Tags

vision foundation-model microsoft captioning detection ocr

Related Tools

Depth Anything V1

Computer Vision & Object Detection

Foundation model for monocular depth estimation by TikTok.

Open SourceSelf HostedOfflineGPU 4GB+

Easy

0.0 (0)

SigLIP

Computer Vision & Object Detection

Improved vision-language model by Google using sigmoid loss for contrastive learning.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

OWL-ViT

Computer Vision & Object Detection

Open-vocabulary object detection model by Google using vision transformers.

Open SourceSelf HostedOfflineGPU 6GB+

Intermediate

0.0 (0)

MMDetection

Computer Vision & Object Detection

OpenMMLab detection toolbox with 300+ pre-trained models and 80+ algorithms.

Open SourceSelf HostedOfflineGPU 8GB+

Advanced

0.0 (0)

Featured

Ultralytics YOLO

Computer Vision & Object Detection

State-of-the-art real-time object detection supporting YOLOv5 through v11.

Open SourceSelf HostedOffline

Easy

0.0 (0)

ArcFace

Computer Vision & Object Detection

Additive angular margin loss for deep face recognition.

Open SourceSelf HostedOfflineGPU 4GB+

Intermediate

0.0 (0)

Browse all Computer Vision & Object Detection tools