decaNLP
decaNLP copied to clipboard
NaN loss and only OOV in the greedy output
The loss initially was decreasing until it reach nan's for a while. I am running it on the squad dataset and the exact argument used for running it is:
python train.py --train_tasks squad --device 0 --data ./.data --save ./results/ --embeddings ./.embeddings/ --train_batch_tokens 2000
So the only change is the train batch tokens to 2000 since my GPU was running out of memory. I am attaching a screenshot. Is there anything I am missing? Should I try something else?

Well that's no good. Let me try running your exact command on my side to see if I get the same thing. Do you know which iteration this first started on? Is it 438000?
Well that's no good. Let me try running your exact command on my side to see if I get the same thing. Do you know which iteration this first started on? Is it 438000?
I had the same question when I ran nvidia-docker run -it --rm -v pwd:/decaNLP/ -u $(id -u):$(id -g) bmccann/decanlp:cuda9_torch041 bash -c "python /decaNLP/train.py --train_tasks squad --device 0" It started at iretation_316800.