How Sliced Inference Works¶
The Problem: Small Objects in Large Images¶
Standard object detectors resize input images to a fixed resolution (e.g. 640x640) before running inference. When your source image is much larger -- say a 4K drone photo or a satellite tile -- small objects get downscaled to just a few pixels and become undetectable.
The Solution: Slice, Detect, Merge¶
SAHI solves this in three steps:
1. Slice the image into overlapping tiles¶
The input image is divided into a grid of smaller patches. Each patch is sized to match what the detector expects (e.g. 512x512), so objects within each patch retain enough pixel detail for reliable detection.
Overlapping regions between tiles ensure that objects sitting on a tile boundary are fully visible in at least one patch.
+--------+--------+--------+
| |overlap | |
| tile |<------>| tile |
| 1 | | 2 |
+--------+--------+--------+
|overlap | |overlap |
| tile | tile | tile |
| 3 | 4 | 5 |
+--------+--------+--------+
Key parameters:
| Parameter | What it controls |
|---|---|
slice_height / slice_width |
Size of each tile in pixels |
overlap_height_ratio / overlap_width_ratio |
Fraction of overlap between adjacent tiles (0.0 -- 1.0) |
auto_slice_resolution |
Let SAHI pick tile sizes based on image resolution |
2. Run the detector on every tile¶
Each tile is passed through the object detection model independently. Because the tiles are small, objects that were tiny in the full image now occupy a meaningful portion of the input and can be detected reliably.
Optionally, SAHI also runs the detector on the full image at its native
resolution (perform_standard_pred=True, the default). This catches large objects
that might get split across multiple tiles.
3. Merge predictions back to the full image¶
Tile-level predictions are mapped back to full-image coordinates. Because tiles overlap, the same object will often be detected in multiple tiles. SAHI applies a postprocessing step to merge or suppress these duplicates:
- GreedyNMM (default) -- Greedily merges overlapping boxes by averaging their coordinates and scores. Best for most use cases.
- NMM -- Non-Maximum Merging. Similar to GreedyNMM but processes all overlaps simultaneously.
- NMS -- Non-Maximum Suppression. Keeps the highest-scoring box and discards overlapping ones. Use when you want strict, non-merged detections.
- LSNMS -- Location-Sensitive NMS. A variant that factors in spatial location.
The merge step can use different overlap metrics:
- IOS (Intersection over Smaller) -- More aggressive merging; good when object sizes vary widely.
- IOU (Intersection over Union) -- Standard metric; more conservative.
When to Use Sliced Inference¶
Sliced inference helps most when:
- Your images are significantly larger than the model's input resolution
- You need to detect small objects (vehicles in satellite images, people in wide-angle surveillance, defects in high-res inspection photos)
- Standard detection misses objects or produces low confidence scores
It may not be necessary when:
- Your images are already close to the model's input size
- You only care about large, prominent objects
- Inference speed is more important than recall
Tuning Tips¶
Tile size: Match the detector's training resolution. For YOLO models trained at 640x640, slices of 512--640 work well.
Overlap ratio: Start with 0.2 (20%). Increase to 0.3--0.4 if you notice missed detections at tile boundaries. Higher overlap means more tiles and slower inference.
Standard prediction: Keep perform_standard_pred=True unless you are certain
all objects of interest are small. The full-image pass catches large objects that
would be split across tiles.
Postprocessing threshold: The postprocess_match_threshold controls how
aggressively duplicates are merged. Lower values merge more; higher values keep
more separate boxes. Default of 0.5 works for most cases.
Next Steps¶
- Quick Start -- Get up and running with SAHI
- Model Integrations -- Use SAHI with your detection framework
- Postprocessing Backends -- Configure NMS/NMM backend for speed