sahi icon indicating copy to clipboard operation
sahi copied to clipboard

I modified a part of the code to enable parallel inference with multiple num_batch

Open zhugelaozei opened this issue 1 year ago • 5 comments

Hi!Since I found that SAHI cannot perform parallel inference when using YOLOv11 for slicing inference, I modified part of the code and adapted it to work with the relevant parts of Ultralytics' code.Unfortunately, I have only adapted the Ultralytics part of the code for now.I hope this is helpful to you.

zhugelaozei avatar Dec 26 '24 14:12 zhugelaozei

Great @zhugelaozei ! Can you please fix the formatting by:

  1. Install development dependencies: pip install -e ."[dev]"

  2. Run code formatting: python -m scripts.run_code_style format

fcakyon avatar Jan 04 '25 15:01 fcakyon

Does the num_batch here refer to running bounding box detections on multiple slices of the same image in a batch? Or is it running multiple images at one time in a batch? Could you provide an example of how to use it?

tonyreina avatar Jan 06 '25 21:01 tonyreina

Hello, I believe this is a very important feature. What is the current status on this?

eVen-gits avatar Feb 04 '25 08:02 eVen-gits

Does the num_batch here refer to running bounding box detections on multiple slices of the same image in a batch? Or is it running multiple images at one time in a batch? Could you provide an example of how to use it?

@tonyreina hey. I believe it refers to one image. This is also the usual way to use sahi - large image, sliced into smaller ones.

On the topic. I tried the modifications today and it works. Here are my observations:

  • It appears that the program crashes, if perform_standard_pred is not set to False
  • Even though sliced prediction seems to work, the performance gains seem to be lower than expected. My predictions for a 12768x9564 (122.1MP) image, using 640x640 slices goes from ~13s to ~8s (I don't have an accurate metric).

With regards to the latter, I suspect it's similar to what I observed running multiple instances of inference on separate processes.

One reason is likely the data loading limitation, but more than that, I would suspect it has to do with some low level locking of the GPU operations? I really am not an expert in terms of hardware utilization, but perhaps someone with more experience could shed some light on this topic.

Either way, it's a welcome addition and I hope this change is seriously considered.

eVen-gits avatar Feb 07 '25 08:02 eVen-gits

Hello. I did quite some work on this a while ago. While I did not submit a PR for this I thought I would submit the link here for reference in the hopes that some of the implementation would help getting this PR merged.

https://github.com/dceluis/sahi_batched

dceluis avatar Feb 13 '25 09:02 dceluis