generating-reviews-discovering-sentiment icon indicating copy to clipboard operation
generating-reviews-discovering-sentiment copied to clipboard

Tips for training on GPU?

Open jonny-d opened this issue 7 years ago • 2 comments

Hello,

I am trying to train this model in Tensorflow using the values for batch size and sequence length given in the paper (batches of 128 and sequence length of 256) though I am struggling to implement the model with these hyper-parameters. I am able to train the model with the same hidden size as reported in the paper (hidden-size of 4096), but only with smaller batch and sequence length settings. As I increase the values of these hyper parameters I encounter OOM memory errors. Debugging the causes of these errors is tricky. I am currently looking into using tfdbg and also tfprof. My model crashes during the session.run() call to my optimizer op.

Could you share any details of how you implemented this model? or give any recommendations for creating efficient implementations (e.g creating efficient input pipelines, device placement in TF graph, common pitfalls, debugging tips)

I am using a Google Cloud Platform Compute Instance for my implementation.

Any tips or tricks to help with implementing this would be greatly appreciated!

Thanks, Jonny

jonny-d avatar Oct 17 '17 16:10 jonny-d

Are you aware that it took them 1 month to train the model?

eggie5 avatar Oct 23 '17 21:10 eggie5

Yes

jonny-d avatar Oct 24 '17 16:10 jonny-d