supervision
supervision copied to clipboard
[InferenceSlicer] - allow batch size inference
Description
Currently, sv.InferenceSlicer processes each slice in a separate callback call - hindering inference with a batch size larger than 1. We can change this by:
- Batching Slices: Instead of submitting individual tasks for each slice, group slices into batches.
batch_sizecan be a new parameter for theInferenceSlicerclass. - Modifying the Callback: Ensure the callback function can handle a batch of slices instead of a single slice. Changing the callback signature from
callback: Callable[[np.ndarray], Detections]to callback:Callable[[List[np.ndarray]], List[Detections]]. - Collecting and Merging Results: After processing, you must appropriately collect and merge the results from the batches.
Additional
- Note: Please share a Google Colab with minimal code to test the new feature. We know it's additional work, but it will definitely speed up the review process. Each change must be tested by the reviewer. Setting up a local environment to do this is time-consuming. Please ensure that Google Colab can be accessed without any issues (make it public). Thank you! 🙏🏻
Hi, @inakierregueab 👋🏻 That is something we were considering but didn't implement due to time restrictions. Let me add some details to this issue. Maybe someone will pick it up.
Hi @SkalskiP, can I work on this issue if it is for beginners? Thanks
Hi, @Bhavay-2001 👋🏻 Do you already have experience with running model inference at different batch sizes?
Hi @SkalskiP, yes I think I can manage that. Can you please let me know how to proceed with this? Thanks
Great! Do you have any specific questions?
Hi @SkalskiP, how to add batch_size feature in the Inference Class. How can I test in google colab? Any start point that can help me get on track will be helpful.
I outlined vital steps that need to be taken to add batch_size support in task description. I think you should just try to implement it, get first working version and submit PR so we could review it.
Hi @SkalskiP, can you please refer me some code sample that is already been implemented and provides the batch_size functionality?
@Bhavay-2001, I'm afraid we do not have a code sample. Implementing batch inference was supposed to be executed in this task. :/
@SkalskiP, What I am thinking of doing is to implement a for loop with batch of images. Each image is then passed to the model and detections are collected and at the end the detections are returned for the batch.
Hi @SkalskiP, can you please review this PR?
Hi @SkalskiP, can you please review and let me know. Thanks
Me and SkalskiP had a conversation about this - I'll take over for now.
Intermediate results:
- I've confirmed that threads help, especially when the model is run on the CPU. I see a 5-10x performance improvement.
- I've implemented the batched inference slicer, allowing users to input both images and lists of images.
- Threading implementation is kept, docs written to point to either
batch=N; threads=1orbatch=1; threads=N, depending on GPU / CPU needs.
Testing more broadly, however, provides mixed results.
- On my machine, batching provides a speed boost for
ultralytics, does nothing fortransformers(GPU) and inference (CPU, I believe). - ~~Using threads=8 slows down the
ultralytics, batch=1case, compared to threads=1.~~ Only slower on my machine. In Colabs it's faster.
Still checking transformers - there's an obvious speedup with GPU, but I ran out of memory when trying with batching.
Colab coming soon.
https://colab.research.google.com/drive/1j85QErM74VCSLADoGliM296q4GFUdnGM?usp=sharing
As you can see, in these tests it only helped the Ultralytics case.
Known insufficiencies:
- ~~Inference 1 model is fit for vehicle detection but is tested on an image with people.~~
- ~~No image to check how well it performed.~~
- ~~No tests for auto-batch case (when
max_batch_size=-1).~~ - ~~Missing examples in dosctring: normal vs batch callback~~
- No improvements to
nmsefficiency.
PR: #1108
@SkalskiP, Ready for review, details in #1108.