Asher Moldwin
Results
1
comments of
Asher Moldwin
Hi James, can you clarify why BERT models need the raw examples in order to calculate gradients rather than just the activations from a given bottleneck?