d2l-en TF implementation of RNN-scratch runs much slower than the other implementations

TF implementation of RNN-scratch runs much slower than the other implementations

Open astonzhang opened this issue 5 years ago • 3 comments

http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_recurrent-neural-networks/rnn-scratch.html

On the same machine, TF runs for about 9mins, MX/PT runs for between 3 and 5 mins.

@abhinavsp0730, can you take a look? Thanks.

Sep 17 '20 04:09 astonzhang

Hi, @astonzhang thanks good catch. I've raised the PR to fix the issue. But the notebooks of TF runs slower than mxnet/pt because:

throughout the Implementation we're using one device strategy. It means the model is training in one GPU/CPU
Till now we haven't used @tf.function decorator. By doing this we'll get 15x speed in executing python functions Thanks

Sep 17 '20 17:09 abhinavsp0730

Thanks. RNN scratch does trains in one GPU for all the frameworks so I guess this is probably not the root cause. Maybe @terrytangyuan may help you with your PR on this.

Sep 17 '20 19:09 astonzhang

@astonzhang I guess it doesn't because here http://d2l.ai/chapter_convolutional-neural-networks/lenet.html (train_ch6) we've to explicitly define one device strategy in order to utilize the gpu. Thanks.

Sep 18 '20 03:09 abhinavsp0730

d2l-en d2l-en copied to clipboard

TF implementation of RNN-scratch runs much slower than the other implementations

d2l-en
d2l-en copied to clipboard