YOLOE Model¶
sahi.models.yoloe
¶
Classes¶
YOLOEDetectionModel
¶
Bases: UltralyticsDetectionModel
YOLOE Detection Model for open-vocabulary detection and segmentation.
YOLOE (Real-Time Seeing Anything) is a zero-shot, promptable YOLO model designed for open-vocabulary detection and segmentation. It supports text prompts, visual prompts, and prompt-free detection with internal vocabulary (1200+ categories).
Key Features
- Open-vocabulary detection: Detect any object class via text prompts
- Visual prompting: One-shot detection using reference images
- Instance segmentation: Built-in segmentation for detected objects
- Real-time performance: Maintains YOLO speed with no inference overhead
- Prompt-free mode: Uses internal vocabulary for open-set recognition
Available Models
Text/Visual Prompt models: - yoloe-11s-seg.pt, yoloe-11m-seg.pt, yoloe-11l-seg.pt - yoloe-v8s-seg.pt, yoloe-v8m-seg.pt, yoloe-v8l-seg.pt
Prompt-free models: - yoloe-11s-seg-pf.pt, yoloe-11m-seg-pf.pt, yoloe-11l-seg-pf.pt - yoloe-v8s-seg-pf.pt, yoloe-v8m-seg-pf.pt, yoloe-v8l-seg-pf.pt
Usage with text prompts
from sahi import AutoDetectionModel
# Load YOLOE model
detection_model = AutoDetectionModel.from_pretrained(
model_type="yoloe",
model_path="yoloe-11l-seg.pt",
confidence_threshold=0.3,
device="cuda:0"
)
# Set text prompts for specific classes
detection_model.model.set_classes(
["person", "car", "traffic light"],
detection_model.model.get_text_pe(["person", "car", "traffic light"])
)
# Perform prediction
from sahi.predict import get_prediction
result = get_prediction("image.jpg", detection_model)
Usage for standard detection (no prompts)
from sahi import AutoDetectionModel
# Load YOLOE model (works like standard YOLO)
detection_model = AutoDetectionModel.from_pretrained(
model_type="yoloe",
model_path="yoloe-11l-seg.pt",
confidence_threshold=0.3,
device="cuda:0"
)
# Perform prediction without prompts (uses internal vocabulary)
from sahi.predict import get_sliced_prediction
result = get_sliced_prediction(
"image.jpg",
detection_model,
slice_height=512,
slice_width=512,
overlap_height_ratio=0.2,
overlap_width_ratio=0.2
)
Note
- YOLOE models perform instance segmentation by default
- When used without prompts, YOLOE performs like standard YOLO11 with identical speed
- For visual prompting, see Ultralytics YOLOE documentation
- YOLOE achieves +3.5 AP over YOLO-Worldv2 on LVIS with 1.4x faster inference
References
- Paper: https://arxiv.org/abs/2503.07465
- Docs: https://docs.ultralytics.com/models/yoloe/
- GitHub: https://github.com/THU-MIG/yoloe
Source code in sahi/models/yoloe.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
Functions¶
load_model()
¶
Loads the YOLOE detection model from the specified path.
Initializes the YOLOE model with the given model path or uses the default 'yoloe-11s-seg.pt' if no path is provided. The model is then moved to the specified device (CPU/GPU).
By default, YOLOE works in prompt-free mode using its internal vocabulary of 1200+ categories. To use text prompts for specific classes, call model.set_classes() after loading:
model.set_classes(["person", "car"], model.get_text_pe(["person", "car"]))
Raises:
Type | Description |
---|---|
TypeError
|
If the model_path is not a valid YOLOE model path or if the ultralytics package with YOLOE support is not installed. |