mmaction2 icon indicating copy to clipboard operation
mmaction2 copied to clipboard

Data Pipeline (Preprocessing videos for action recognition models like I3D) on GPU

Open alitirmizi23 opened this issue 2 years ago • 7 comments

I would like to know if the preprocessing of videos for action recognition models (e.g I3D) can be done on GPU instead of CPU? I've noticed that a 10s video clip can take up to 40-45s on my RTX2080, where the inference time is merely 2-3s but the preprocessing data pipeline takes about 40s alone. Is there a way to speed this up? Or could GPUs be utilized for that in mmaction2?

alitirmizi23 avatar Jul 07 '22 06:07 alitirmizi23

@alitirmizi23 It is planned in the next half year. You are also very welcome to contribute!

hukkai avatar Jul 08 '22 07:07 hukkai

I’d like to try. Any pointers? Or anything particular you had in mind already that I can pick up?

alitirmizi23 avatar Jul 08 '22 07:07 alitirmizi23

@alitirmizi23 First I need to test which part is the bottleneck during data processing, and suitable on GPU. One thing that I can think of is the resize part. Can you check which part is the bottleneck during your program?

hukkai avatar Jul 08 '22 07:07 hukkai

The resize part definitely is one since I benchmarked between different resolution videos: the high quality ones take a lot more time than the low quality (low res) ones. Since spatial size (w x h) is the only difference between those videos, I’d imagine that’s the most time taking part. I haven’t compared individually the preprocessing steps

alitirmizi23 avatar Jul 08 '22 07:07 alitirmizi23

I'm working with @alitirmizi23 on that, We want probably optimize the following on GPU [ DecordDecode, Resize, ThreeCrop, Normalize ] from 1920×1080 RSTP stream or higher .... We are thinking of using:

It'd be very helpful to give us your opinion about which path you think we can try 1st ...

mustafah avatar Jul 08 '22 10:07 mustafah

@mustafah My initial idea is to use a data prefetcher to handle GPU Resize. Normalize is easy to implement and will be supported in the next version of mmaction2. For DecordDecode, I know it can be accelerated with GPUs, but I do not know how to use it with data loader. Do you know any pytorch examples to use decord with GPU? For the other points you mentioned, I will look into them in the weekends and come back to you as soon as possible. Thanks!

hukkai avatar Jul 08 '22 11:07 hukkai

I just find a comment on the decord issues that using decord GPU decoding with PyTorch can be slower than CPU decoding. Maybe we can only accelerate resize if the input video length is large, say 32.

hukkai avatar Jul 23 '22 15:07 hukkai