speaker_follower
speaker_follower copied to clipboard
Killed while evaluating
I changed the dataset from R2R to R4R which contains over 45k instructions in the val_unseen dataset. The training is killed when about 15k of them are evaluated. The machine I am using has 64 GB memory and a Tesla V100 graphic card. The batch_size is set to 8. I am not sure what is the bottleneck here. I guess it is the result dictionary that is taking up too much memory? Is it a good practice to release the memory every about 10k results?