supervision icon indicating copy to clipboard operation
supervision copied to clipboard

[InferenceSlicer] - allow batch size inference

Open inakierregueab opened this issue 1 year ago • 21 comments

Description

Currently, sv.InferenceSlicer processes each slice in a separate callback call - hindering inference with a batch size larger than 1. We can change this by:

  • Batching Slices: Instead of submitting individual tasks for each slice, group slices into batches. batch_size can be a new parameter for the InferenceSlicer class.
  • Modifying the Callback: Ensure the callback function can handle a batch of slices instead of a single slice. Changing the callback signature from callback: Callable[[np.ndarray], Detections] to callback: Callable[[List[np.ndarray]], List[Detections]].
  • Collecting and Merging Results: After processing, you must appropriately collect and merge the results from the batches.

Additional

  • Note: Please share a Google Colab with minimal code to test the new feature. We know it's additional work, but it will definitely speed up the review process. Each change must be tested by the reviewer. Setting up a local environment to do this is time-consuming. Please ensure that Google Colab can be accessed without any issues (make it public). Thank you! 🙏🏻

inakierregueab avatar Jan 25 '24 21:01 inakierregueab

Hi, @inakierregueab 👋🏻 That is something we were considering but didn't implement due to time restrictions. Let me add some details to this issue. Maybe someone will pick it up.

SkalskiP avatar Jan 25 '24 22:01 SkalskiP

Hi @SkalskiP, can I work on this issue if it is for beginners? Thanks

Bhavay-2001 avatar Jan 26 '24 05:01 Bhavay-2001

Hi, @Bhavay-2001 👋🏻 Do you already have experience with running model inference at different batch sizes?

SkalskiP avatar Jan 26 '24 07:01 SkalskiP

Hi @SkalskiP, yes I think I can manage that. Can you please let me know how to proceed with this? Thanks

Bhavay-2001 avatar Jan 28 '24 09:01 Bhavay-2001

Great! Do you have any specific questions?

SkalskiP avatar Jan 28 '24 09:01 SkalskiP

Hi @SkalskiP, how to add batch_size feature in the Inference Class. How can I test in google colab? Any start point that can help me get on track will be helpful.

Bhavay-2001 avatar Jan 30 '24 15:01 Bhavay-2001

I outlined vital steps that need to be taken to add batch_size support in task description. I think you should just try to implement it, get first working version and submit PR so we could review it.

SkalskiP avatar Jan 30 '24 16:01 SkalskiP

Hi @SkalskiP, can you please refer me some code sample that is already been implemented and provides the batch_size functionality?

Bhavay-2001 avatar Jan 30 '24 17:01 Bhavay-2001

@Bhavay-2001, I'm afraid we do not have a code sample. Implementing batch inference was supposed to be executed in this task. :/

SkalskiP avatar Jan 30 '24 17:01 SkalskiP

@SkalskiP, What I am thinking of doing is to implement a for loop with batch of images. Each image is then passed to the model and detections are collected and at the end the detections are returned for the batch.

Bhavay-2001 avatar Jan 30 '24 17:01 Bhavay-2001

Hi @SkalskiP, can you please review this PR?

Bhavay-2001 avatar Feb 06 '24 14:02 Bhavay-2001

Hi @SkalskiP, can you please review and let me know. Thanks

Bhavay-2001 avatar Feb 16 '24 16:02 Bhavay-2001

Me and SkalskiP had a conversation about this - I'll take over for now.

LinasKo avatar Apr 10 '24 13:04 LinasKo

Intermediate results:

  1. I've confirmed that threads help, especially when the model is run on the CPU. I see a 5-10x performance improvement.
  2. I've implemented the batched inference slicer, allowing users to input both images and lists of images.
  3. Threading implementation is kept, docs written to point to either batch=N; threads=1 or batch=1; threads=N, depending on GPU / CPU needs.

Testing more broadly, however, provides mixed results.

  1. On my machine, batching provides a speed boost for ultralytics, does nothing for transformers (GPU) and inference (CPU, I believe).
  2. ~~Using threads=8 slows down the ultralytics, batch=1 case, compared to threads=1.~~ Only slower on my machine. In Colabs it's faster.

Still checking transformers - there's an obvious speedup with GPU, but I ran out of memory when trying with batching.

Colab coming soon.

LinasKo avatar Apr 10 '24 13:04 LinasKo

https://colab.research.google.com/drive/1j85QErM74VCSLADoGliM296q4GFUdnGM?usp=sharing

As you can see, in these tests it only helped the Ultralytics case.

Known insufficiencies:

  • ~~Inference 1 model is fit for vehicle detection but is tested on an image with people.~~
  • ~~No image to check how well it performed.~~
  • ~~No tests for auto-batch case (when max_batch_size=-1).~~
  • ~~Missing examples in dosctring: normal vs batch callback~~
  • No improvements to nms efficiency.

LinasKo avatar Apr 10 '24 14:04 LinasKo

PR: #1108

LinasKo avatar Apr 10 '24 15:04 LinasKo

@SkalskiP, Ready for review, details in #1108.

LinasKo avatar Apr 11 '24 17:04 LinasKo