bigbird
bigbird copied to clipboard
Learning rate mentioned in paper vs run_summarization.py
Hi ,
The learning rate mentioned in paper for summarization is around 3e-5 . But in the run_summarization.py it is mentioned as 0.32 ( default ) in the flags. In roberta_base.sh script, there is no changing happen for the learning rate.
Can anyone please update on this, as learning rate is very crucial for models like these.
Thanks