fairseq
fairseq copied to clipboard
long-standing Bug in Adafactor optimizer if beta1 > 0
There seems to be an issue with the Adafactor optimizer found here, if beta1 is > 0: https://github.com/facebookresearch/fairseq/blob/ecbf110e1eb43861214b05fa001eff584954f65a/fairseq/optim/adafactor.py#L66
Please find a detailed description here: https://github.com/huggingface/transformers/issues/34506