long-range-arena
long-range-arena copied to clipboard
Q's on Performer & Text Classification
Thanks for the great work. I had a couple questions when trying to reproduce the Performer on the Byte Level Text Classification:
- What Kernel Function are you using? (Softmax approximation or Relu?)
- I found the training to be very instable. Do you take the final model after 20K steps or do you take the best checkpoint?
- With the learning rate scheduler you use, the learning rate is 0 if the first step is 0 isn't it? Shouldn't you instead start your training loop with for step in range(1, X) at https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/text_classification/train.py
Looking forward to the implementations of the other models, thanks!
FYI: the implementations of all models are available now.