Skip to content

How Sliced Inference Works

The Problem: Small Objects in Large Images

Standard object detectors resize input images to a fixed resolution (e.g. 640x640) before running inference. When your source image is much larger -- say a 4K drone photo or a satellite tile -- small objects get downscaled to just a few pixels and become undetectable.

sliced inference

The Solution: Slice, Detect, Merge

SAHI solves this in three steps:

1. Slice the image into overlapping tiles

The input image is divided into a grid of smaller patches. Each patch is sized to match what the detector expects (e.g. 512x512), so objects within each patch retain enough pixel detail for reliable detection.

Overlapping regions between tiles ensure that objects sitting on a tile boundary are fully visible in at least one patch.

+--------+--------+--------+
|        |overlap |        |
|  tile  |<------>|  tile  |
|   1    |        |   2    |
+--------+--------+--------+
|overlap |        |overlap |
|  tile  |  tile  |  tile  |
|   3    |   4    |   5    |
+--------+--------+--------+

Key parameters:

Parameter What it controls
slice_height / slice_width Size of each tile in pixels
overlap_height_ratio / overlap_width_ratio Fraction of overlap between adjacent tiles (0.0 -- 1.0)
auto_slice_resolution Let SAHI pick tile sizes based on image resolution

2. Run the detector on every tile

Each tile is passed through the object detection model independently. Because the tiles are small, objects that were tiny in the full image now occupy a meaningful portion of the input and can be detected reliably.

Optionally, SAHI also runs the detector on the full image at its native resolution (perform_standard_pred=True, the default). This catches large objects that might get split across multiple tiles.

3. Merge predictions back to the full image

Tile-level predictions are mapped back to full-image coordinates. Because tiles overlap, the same object will often be detected in multiple tiles. SAHI applies a postprocessing step to merge or suppress these duplicates:

  • GreedyNMM (default) -- Greedily merges overlapping boxes by averaging their coordinates and scores. Best for most use cases.
  • NMM -- Non-Maximum Merging. Similar to GreedyNMM but processes all overlaps simultaneously.
  • NMS -- Non-Maximum Suppression. Keeps the highest-scoring box and discards overlapping ones. Use when you want strict, non-merged detections.
  • LSNMS -- Location-Sensitive NMS. A variant that factors in spatial location.

The merge step can use different overlap metrics:

  • IOS (Intersection over Smaller) -- More aggressive merging; good when object sizes vary widely.
  • IOU (Intersection over Union) -- Standard metric; more conservative.

When to Use Sliced Inference

Sliced inference helps most when:

  • Your images are significantly larger than the model's input resolution
  • You need to detect small objects (vehicles in satellite images, people in wide-angle surveillance, defects in high-res inspection photos)
  • Standard detection misses objects or produces low confidence scores

It may not be necessary when:

  • Your images are already close to the model's input size
  • You only care about large, prominent objects
  • Inference speed is more important than recall

Tuning Tips

Tile size: Match the detector's training resolution. For YOLO models trained at 640x640, slices of 512--640 work well.

Overlap ratio: Start with 0.2 (20%). Increase to 0.3--0.4 if you notice missed detections at tile boundaries. Higher overlap means more tiles and slower inference.

Standard prediction: Keep perform_standard_pred=True unless you are certain all objects of interest are small. The full-image pass catches large objects that would be split across tiles.

Postprocessing threshold: The postprocess_match_threshold controls how aggressively duplicates are merged. Lower values merge more; higher values keep more separate boxes. Default of 0.5 works for most cases.

Next Steps