NJUNMT-pytorch icon indicating copy to clipboard operation
NJUNMT-pytorch copied to clipboard

[question] high GPU memory occupied

Open SkyAndCloud opened this issue 6 years ago • 4 comments

I used default dl4mt_nist_zh2en.yaml to train DL4MT model on 1.25M NIST zh-en corpus and I found the code occupies ~10G gpu memory, which is much higher than my in-house implementation. I checked DL4MT model's parameters number should be ~64 million, which doesn't need so much memory. Is there something wrong? Thanks!

SkyAndCloud avatar Oct 13 '18 15:10 SkyAndCloud

@SkyAndCloud Does your in-house implementation use pytorch as well? Most GPU memory consumption is used as the buffer to build graph and do forward and backward computation. In fact, the parameters of model take a little bit part of it.

whr94621 avatar Oct 13 '18 15:10 whr94621

I use pytorch, too. For fairly comparison you can try in OpenNMT-py which implemented in pytorch0.4 with model containing 1 layer BiGRU encoder, 2 layer GRU decoder and mlp attention, whose parameters amount similar to dl4mt. I remember it should use less than 8G GPU memory.

SkyAndCloud avatar Oct 14 '18 05:10 SkyAndCloud

@SkyAndCloud Ok , I will compare to OpenNMT-py, but I need to point out that the implementation in OpenNMT-py is different from ours, as their decoder is implemented by using cudnn rnn, but for DL4MT, it is not possible to do that, so we use for loop + GRUCell. This may cause memory inefficiency during training. All in all, thank you for letting me know the memory issue about DL4MT and I will check this issue soon. If there is any result, I will notify you at the first time. 😄

whr94621 avatar Oct 14 '18 08:10 whr94621

I have a question that what do you mean for openmt-py using cudnn rnn rather than for loop+GRUCell. Do you mean opennmt's StdRNNDecoder implementation which generate a sentence at a time rather than token by token like your for loop+GRUCell? If so, OpenNMT-py's InputFeedRNNDecode also uses for loop+RNN to generate outputs token by token and it uses less memory than your implementation. Just a suggestion...

SkyAndCloud avatar Oct 14 '18 14:10 SkyAndCloud