BERT-pytorch
BERT-pytorch copied to clipboard
Why not use torch.no_grad when evaluating test data?
The way the trainer is set up the iteration
that is used for train and test is similar except when train step is run the backwards propagation occurs. But one other thing I typically see different between test and train is that in the test batch with torch.no_grad()
is used so that, for example, dropout is not applied. Was there any reason this isn't used here?
I think it should use torch.no_grad(). Or it will run out of GPU memory.