decord icon indicating copy to clipboard operation
decord copied to clipboard

[Query] Using ctx=gpu performance difference. Is this expected?

Open TheShadow29 opened this issue 6 years ago • 3 comments

Hi. Thanks for the amazing repository.

I installed using -DUSE_CUDA option and then tried the example here (https://github.com/zhreshold/decord/blob/master/examples/video_loader.ipynb). I averaged over ten runs using %time Walltime output. The statement I timed was

vl = de.VideoLoader(videos, ctx=ctx, shape=shape, interval=interval, skip=skip, shuffle=0)

cpu gpu
53 40

I also tried across various shuffle strategies, but nearly all of them were the same when the same device is used.

Wondering if this is what is expected.

TheShadow29 avatar Jun 28 '19 22:06 TheShadow29

GPU frames need to be copied to CPU before display so that can be an considerable overhead. During training, if you are going to consume these frames directly in GPU, it saves twice the traffic:

t(CPU->GPU) - t(GPU->CPU)

Does it make sense?

zhreshold avatar Jul 01 '19 19:07 zhreshold

I only used %time for the line vl = de.VideoLoader(videos, ctx=ctx, shape=shape, interval=interval, skip=skip, shuffle=0). If I understand correctly (let me know if I am incorrect), that shouldn't require transfer of frames from gpu to cpu.

TheShadow29 avatar Jul 01 '19 20:07 TheShadow29

Ok, the line you pointed actually does nothing but instantiate an instance of videoLoader, basically only some header and proprocessing is done. You need to measure the real time elapsed by reading frames.

zhreshold avatar Jul 01 '19 20:07 zhreshold