Asher Moldwin

Results 1 comments of Asher Moldwin

Hi James, can you clarify why BERT models need the raw examples in order to calculate gradients rather than just the activations from a given bottleneck?