Results 5 issues of Ozan Ciga

[stale]

**Describe the bug** - Using the latest version of transformers and deepspeed. - It is possible to fit a ~1.5B parameter model and train it from scratch with batch sizes...

bug
training

using accelerate[huggingface] and latest version of adapter-transformers. i only worked with the `BertAdapterModel` so cannot speak to other models' behavior. i think easiest way to test this w/o having to...

bug

this method is crucial in distributed training yet i found this name very confusing. regarding the manual, the only reference to it seems to be ``You then set the epoch...

documentation

thank you for your work. i am using it for captioning images. i didn't get a chance to review it all, but noticed a few issues. i'm exemplifying one of...