serve Preprocess performance

📚 The doc issue

In the default image processing handler you process images one by one https://github.com/pytorch/serve/blob/a4d5090e114cdbeddf5077a817a8cd02d129159e/ts/torch_handler/vision_handler.py#L38 it works synchronously. What is the best way to optimize it, should I use Pool here or there is a better way?

It happens that if I use batch size = 1 it preprocesses faster but doesn't utilize gpu because of small BS, if I setup BS=128 it preprocesses (resize and other things) 128 images too slow and whole pipeline becomes 2 times slower, but gpu utilization sometimes (when batch is ready) goes to 90% As far as I understand min-workers and max-workers means number of processes for separate batches but I cant parallelize preprocessing in default configuration.

Suggest a potential alternative/fix

No response

Jun 22 '22 15:06 BraginIvan

Thank you for your feedback @BraginIvan this is something that we're working to improve. We had a couple of prototype PRs like #1641 or #1545 to improve our story here but if you have any requirements or thoughts please let me know. At a high level my thinking is either

Integrate more preprocessing libraries like DALI, ffcv, accimage
Instead of iterating over rows of data, just instantiate a tensor directly on GPU

The decision we'll take is benchmark dependent there's a few quirks with each of these

DALI: requires changes to our backend, they bundle decoding, preprocessing optimizations and a data loader - hard to pick just 1
ffcv: requires a batch data transform offline so to me it seemed better suited for training than inference
accimage: was much faster see benchmarks #1545 but wasn't clear what the long term maintenance plan of the project is
Leverage more optimizations in torch/vision which had some known issues https://github.com/pytorch/vision/issues/3848
Integrate directly optimizations in TS or at the very least don't do known to be slow things like torch.stack

Jun 22 '22 17:06 msaroufim

@BraginIvan We are investigating potential solutions for parallel preprocessing multiple images in one video frame.

solution1: parallel preprocessing multiple images out of TS by using pipeline.
solution2: optimize handler by using multiprocessing to parallel preprocessing.

Jul 08 '22 18:07 lxning

serve serve copied to clipboard

Preprocess performance

📚 The doc issue

Suggest a potential alternative/fix

serve
serve copied to clipboard