CNTK icon indicating copy to clipboard operation
CNTK copied to clipboard

[Discussion] How does CNTK computes on GPU?

Open faruknane opened this issue 4 years ago • 0 comments

Hi, I have been writing a deep learning library. I think a lot about how the way of gpu and cpu computing should be and wonder how cntk computes actually and achieves building the fastest RNN. Before continue on telling, I want to say that I tried to find the answer by looking at the codes of CNTK. However, It's so complicated, I couldn't understand at all.

My library has Layers which all have sequence length as outer dimension, like below. Each box of a layer is called a Term. image

One way is, I can continue with one main thread on the cpu and parallelize computing of a term in gpu. This way, I only compute one Term at a time. However, that term may not fill the entire gpu load. So gpu utilization could be so low.

The second approach I thought is, Whenever I am to compute a Term, first I compute the necessary Terms that my Term will need, simultaneously. Then after computing the necessary terms, I can use them and compute my term. This way, I can push more than one Term to Gpu to compute. But there might a problem here as well which I don't know for sure.

I wonder how CNTK computes those. Can someone just give a little idea?

Thank you

faruknane avatar Apr 19 '20 12:04 faruknane