decord [Query] Using ctx=gpu performance difference. Is this expected?

[Query] Using ctx=gpu performance difference. Is this expected?

Open TheShadow29 opened this issue 6 years ago • 3 comments

Hi. Thanks for the amazing repository.

I installed using -DUSE_CUDA option and then tried the example here (https://github.com/zhreshold/decord/blob/master/examples/video_loader.ipynb). I averaged over ten runs using %time Walltime output. The statement I timed was

vl = de.VideoLoader(videos, ctx=ctx, shape=shape, interval=interval, skip=skip, shuffle=0)

cpu	gpu
53	40

I also tried across various shuffle strategies, but nearly all of them were the same when the same device is used.

Wondering if this is what is expected.

Jun 28 '19 22:06 TheShadow29

GPU frames need to be copied to CPU before display so that can be an considerable overhead. During training, if you are going to consume these frames directly in GPU, it saves twice the traffic:

t(CPU->GPU) - t(GPU->CPU)

Does it make sense?

Jul 01 '19 19:07 zhreshold

I only used %time for the line vl = de.VideoLoader(videos, ctx=ctx, shape=shape, interval=interval, skip=skip, shuffle=0). If I understand correctly (let me know if I am incorrect), that shouldn't require transfer of frames from gpu to cpu.

Jul 01 '19 20:07 TheShadow29

Ok, the line you pointed actually does nothing but instantiate an instance of videoLoader, basically only some header and proprocessing is done. You need to measure the real time elapsed by reading frames.

Jul 01 '19 20:07 zhreshold

decord decord copied to clipboard

[Query] Using ctx=gpu performance difference. Is this expected?

decord
decord copied to clipboard