SAHI with TorchVision for Sliced Inference¶

0. Preparation¶

Install latest version of SAHI and Torchvision:

In [ ]:

Copied!

!pip install -U git+https://github.com/obss/sahi
!pip install torch torchvision
!pip install -U git+https://github.com/obss/sahi
!pip install torch torchvision

In [ ]:

Copied!

import os

os.getcwd()
import os

os.getcwd()

Import required modules:

In [7]:

Copied!





# import required functions, classes
from IPython.display import Image

from sahi import AutoDetectionModel
from sahi.predict import get_prediction, get_sliced_prediction, predict
from sahi.utils.cv import read_image
from sahi.utils.file import download_from_url
# import required functions, classes
from IPython.display import Image

from sahi import AutoDetectionModel
from sahi.predict import get_prediction, get_sliced_prediction, predict
from sahi.utils.cv import read_image
from sahi.utils.file import download_from_url

In [8]:

Copied!





# set torchvision FasterRCNN model
import torchvision
from torchvision.models.detection import FasterRCNN_ResNet50_FPN_Weights

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights=FasterRCNN_ResNet50_FPN_Weights.DEFAULT)

# download test images into demo_data folder
download_from_url(
    "https://raw.githubusercontent.com/obss/sahi/main/demo/demo_data/small-vehicles1.jpeg",
    "demo_data/small-vehicles1.jpeg",
)
download_from_url(
    "https://raw.githubusercontent.com/obss/sahi/main/demo/demo_data/terrain2.png", "demo_data/terrain2.png"
)
# set torchvision FasterRCNN model
import torchvision
from torchvision.models.detection import FasterRCNN_ResNet50_FPN_Weights

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights=FasterRCNN_ResNet50_FPN_Weights.DEFAULT)

# download test images into demo_data folder
download_from_url(
    "https://raw.githubusercontent.com/obss/sahi/main/demo/demo_data/small-vehicles1.jpeg",
    "demo_data/small-vehicles1.jpeg",
)
download_from_url(
    "https://raw.githubusercontent.com/obss/sahi/main/demo/demo_data/terrain2.png", "demo_data/terrain2.png"
)

1. Standard Inference with a Torchvision Model¶

Instantiate a torchvision model by defining model weight path, config path and other parameters:

In [9]:

Copied!





detection_model = AutoDetectionModel.from_pretrained(
    model_type="torchvision",
    model=model,
    confidence_threshold=0.5,
    image_size=640,
    device="cpu",  # or "cuda:0"
    load_at_init=True,
)
detection_model = AutoDetectionModel.from_pretrained(
    model_type="torchvision",
    model=model,
    confidence_threshold=0.5,
    image_size=640,
    device="cpu",  # or "cuda:0"
    load_at_init=True,
)

Perform prediction by feeding the get_prediction function with an image path and a DetectionModel instance:

In [10]:

Copied!

result = get_prediction("demo_data/small-vehicles1.jpeg", detection_model)
result = get_prediction("demo_data/small-vehicles1.jpeg", detection_model)

Or perform prediction by feeding the get_prediction function with a numpy image and a DetectionModel instance:

In [5]:

Copied!

result = get_prediction(read_image("demo_data/small-vehicles1.jpeg"), detection_model)
result = get_prediction(read_image("demo_data/small-vehicles1.jpeg"), detection_model)

Visualize predicted bounding boxes and masks over the original image:

In [11]:

Copied!

result.export_visuals(export_dir="demo_data/")

Image("demo_data/prediction_visual.png")
result.export_visuals(export_dir="demo_data/")

Image("demo_data/prediction_visual.png")

Out[11]:

No description has been provided for this image

2. Sliced Inference with a TorchVision Model¶

To perform sliced prediction we need to specify slice parameters. In this example we will perform prediction over slices of 320x320 with an overlap ratio of 0.2:

In [7]:

Copied!





result = get_sliced_prediction(
    "demo_data/small-vehicles1.jpeg",
    detection_model,
    slice_height=320,
    slice_width=320,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
)
result = get_sliced_prediction(
    "demo_data/small-vehicles1.jpeg",
    detection_model,
    slice_height=320,
    slice_width=320,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
)

Performing prediction on 12 slices.

Visualize predicted bounding boxes and masks over the original image:

In [8]:

Copied!

result.export_visuals(export_dir="demo_data/")

Image("demo_data/prediction_visual.png")
result.export_visuals(export_dir="demo_data/")

Image("demo_data/prediction_visual.png")

Out[8]:

3. Prediction Result¶

Predictions are returned as sahi.prediction.PredictionResult, you can access the object prediction list as:

In [9]:

Copied!

object_prediction_list = result.object_prediction_list
object_prediction_list = result.object_prediction_list

In [10]:

Copied!

object_prediction_list[0]
object_prediction_list[0]

Out[10]:

ObjectPrediction<
    bbox: BoundingBox: <(np.float64(319.9983215332031), np.float64(317.1016845703125), np.float64(383.74927520751953), np.float64(365.41888427734375)), w: 63.750953674316406, h: 48.31719970703125>,
    mask: None,
    score: PredictionScore: <value: 0.9990587830543518>,
    category: Category: <id: 3, name: car>>

ObjectPrediction's can be converted to COCO annotation format:

In [11]:

Copied!

result.to_coco_annotations()[:3]
result.to_coco_annotations()[:3]

Out[11]:

[{'image_id': None,
  'bbox': [319.9983215332031,
   317.1016845703125,
   63.750953674316406,
   48.31719970703125],
  'score': 0.9990587830543518,
  'category_id': 3,
  'category_name': 'car',
  'segmentation': [],
  'iscrowd': 0,
  'area': 3080},
 {'image_id': None,
  'bbox': [448.3526611328125,
   305.8587646484375,
   47.124786376953125,
   38.234619140625],
  'score': 0.9988723397254944,
  'category_id': 3,
  'category_name': 'car',
  'segmentation': [],
  'iscrowd': 0,
  'area': 1801},
 {'image_id': None,
  'bbox': [762.3434448242188,
   252.02978515625,
   31.857330322265625,
   32.469417572021484],
  'score': 0.996906578540802,
  'category_id': 3,
  'category_name': 'car',
  'segmentation': [],
  'iscrowd': 0,
  'area': 1034}]

ObjectPrediction's can be converted to COCO prediction format:

In [12]:

Copied!

result.to_coco_predictions(image_id=1)[:3]
result.to_coco_predictions(image_id=1)[:3]

Out[12]:

[{'image_id': 1,
  'bbox': [319.9983215332031,
   317.1016845703125,
   63.750953674316406,
   48.31719970703125],
  'score': 0.9990587830543518,
  'category_id': 3,
  'category_name': 'car',
  'segmentation': [],
  'iscrowd': 0,
  'area': 3080},
 {'image_id': 1,
  'bbox': [448.3526611328125,
   305.8587646484375,
   47.124786376953125,
   38.234619140625],
  'score': 0.9988723397254944,
  'category_id': 3,
  'category_name': 'car',
  'segmentation': [],
  'iscrowd': 0,
  'area': 1801},
 {'image_id': 1,
  'bbox': [762.3434448242188,
   252.02978515625,
   31.857330322265625,
   32.469417572021484],
  'score': 0.996906578540802,
  'category_id': 3,
  'category_name': 'car',
  'segmentation': [],
  'iscrowd': 0,
  'area': 1034}]

ObjectPrediction's can be converted to imantics annotation format:

In [ ]:

Copied!

!pip install -U imantics
!pip install -U imantics

In [13]:

Copied!

result.to_imantics_annotations()[:3]
result.to_imantics_annotations()[:3]

Out[13]:

[<imantics.annotation.Annotation at 0x7f81f7545e50>,
 <imantics.annotation.Annotation at 0x7f81ef156b50>,
 <imantics.annotation.Annotation at 0x7f81ef1614c0>]

4. Batch Prediction¶

Set model and directory parameters:

In [15]:

Copied!





model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
detection_model = AutoDetectionModel.from_pretrained(
    model_type="torchvision",
    model=model,
    confidence_threshold=0.4,
    image_size=640,
    device="cpu",  # or "cuda:0"
    load_at_init=True,
)

slice_height = 256
slice_width = 256
overlap_height_ratio = 0.2
overlap_width_ratio = 0.2

source_image_dir = "demo_data/"
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
detection_model = AutoDetectionModel.from_pretrained(
    model_type="torchvision",
    model=model,
    confidence_threshold=0.4,
    image_size=640,
    device="cpu",  # or "cuda:0"
    load_at_init=True,
)

slice_height = 256
slice_width = 256
overlap_height_ratio = 0.2
overlap_width_ratio = 0.2

source_image_dir = "demo_data/"

Perform sliced inference on given folder:

In [16]:

Copied!





predict(
    detection_model=detection_model,
    source=source_image_dir,
    slice_height=slice_height,
    slice_width=slice_width,
    overlap_height_ratio=overlap_height_ratio,
    overlap_width_ratio=overlap_width_ratio,
)
predict(
    detection_model=detection_model,
    source=source_image_dir,
    slice_height=slice_height,
    slice_width=slice_width,
    overlap_height_ratio=overlap_height_ratio,
    overlap_width_ratio=overlap_width_ratio,
)

There are 3 listed files in folder: demo_data/

Performing inference on images:   0%|          | 0/3 [00:00<?, ?it/s]

Performing prediction on 15 slices.

Performing inference on images:  33%|███▎      | 1/3 [00:04<00:08,  4.30s/it]

Prediction time is: 4271.57 ms
Performing prediction on 15 slices.

Performing inference on images:  67%|██████▋   | 2/3 [00:08<00:04,  4.38s/it]

Prediction time is: 4413.79 ms
Performing prediction on 20 slices.

Performing inference on images: 100%|██████████| 3/3 [00:14<00:00,  4.89s/it]

Prediction time is: 5902.43 ms
Prediction results are successfully exported to runs/predict/exp6

5. Sliced Segmentation¶

SAHI also supports the torchvision maskrcnn and maskrcnn_v2 instance segmentation models:

In [19]:

Copied!





maskrcnn_model = torchvision.models.detection.maskrcnn_resnet50_fpn(
    weights=torchvision.models.detection.MaskRCNN_ResNet50_FPN_Weights.DEFAULT
)
detection_model_seg = AutoDetectionModel.from_pretrained(
    model_type="torchvision",
    model=maskrcnn_model,
    confidence_threshold=0.5,
    mask_threshold=0.8,
    image_size=1333,
    device="cpu",  # or "cuda:0"
    load_at_init=True,
)
im = read_image("demo_data/small-vehicles1.jpeg")
maskrcnn_model = torchvision.models.detection.maskrcnn_resnet50_fpn(
    weights=torchvision.models.detection.MaskRCNN_ResNet50_FPN_Weights.DEFAULT
)
detection_model_seg = AutoDetectionModel.from_pretrained(
    model_type="torchvision",
    model=maskrcnn_model,
    confidence_threshold=0.5,
    mask_threshold=0.8,
    image_size=1333,
    device="cpu",  # or "cuda:0"
    load_at_init=True,
)
im = read_image("demo_data/small-vehicles1.jpeg")

Perform standard segmentation:

In [20]:

Copied!

result = get_prediction(im, detection_model_seg)
result = get_prediction(im, detection_model_seg)

In [21]:

Copied!

result.export_visuals(export_dir="demo_data/")

Image("demo_data/prediction_visual.png")
result.export_visuals(export_dir="demo_data/")

Image("demo_data/prediction_visual.png")

Out[21]:

Repeat for sliced segmentation:

In [22]:

Copied!





result = get_sliced_prediction(
    im,
    detection_model_seg,
    slice_height=320,
    slice_width=320,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
)
result = get_sliced_prediction(
    im,
    detection_model_seg,
    slice_height=320,
    slice_width=320,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
)

Performing prediction on 12 slices.

In [23]:

Copied!

result.export_visuals(export_dir="demo_data/")

Image("demo_data/prediction_visual.png")
result.export_visuals(export_dir="demo_data/")

Image("demo_data/prediction_visual.png")

Out[23]:

Sliced predictions are much better!

Observe the prediction format:

In [24]:

Copied!

object_prediction_list = result.object_prediction_list
object_prediction_list[0]
object_prediction_list = result.object_prediction_list
object_prediction_list[0]

Out[24]:

ObjectPrediction<
    bbox: BoundingBox: <(524, 225, 544, 240), w: 20, h: 15>,
    mask: <sahi.annotation.Mask object at 0x7ad2a3b7dbd0>,
    score: PredictionScore: <value: 0.9975883960723877>,
    category: Category: <id: 3, name: car>>

In [25]:

Copied!

object_prediction_list[0].mask.segmentation
object_prediction_list[0].mask.segmentation

Out[25]: