speaker_follower icon indicating copy to clipboard operation
speaker_follower copied to clipboard

Killed while evaluating

Open ZhangTianrong opened this issue 4 years ago • 0 comments

I changed the dataset from R2R to R4R which contains over 45k instructions in the val_unseen dataset. The training is killed when about 15k of them are evaluated. The machine I am using has 64 GB memory and a Tesla V100 graphic card. The batch_size is set to 8. I am not sure what is the bottleneck here. I guess it is the result dictionary that is taking up too much memory? Is it a good practice to release the memory every about 10k results?

ZhangTianrong avatar Nov 11 '20 14:11 ZhangTianrong