equinox
equinox copied to clipboard
shouldn't we renormalize when Inference=True for the dropout layers?
I was checking the BERT example and realized dropout simply doesn't do anything when called with Inference=True, but according to my understanding, you need to renormalize at inference time to keep the activation values on the same expected values.
Perhaps there is something I'm not seeing?.