Parcollet Titouan
Parcollet Titouan
@jamfly if you have enough resources you can try the 1B-2B models XLS-R :-P
Thanks @jamfly! Cool contribution as always. Dynamic Batching should be faster if done properly, did you follow our tutorial on that? If yes, I'll ping someone to help you further...
@popcornell maybe you can help with that?
Hi thanks for this PR. I wonder however if this couldn't be integrated in a more general way i.e. we try to avoid having recipe specific scripts. Do you think...
Hi, this typically is due to OOM. Reduce the batch size and increase the gradient_accumulation to compensate.
cc @anautsch maybe as well
Hi, I am unsure, as the expected behavior would be to really return the predictions in "predictions", maybe we did something wrong. If so, we would be glad to see...
@mravanelli do you know if Abdel is reviewing this, or should we do it ? NAR is quite important.
I'll review the code of this PR, but unsure about running the experiment ... We should try to add it to the new version I think because it looks ready.
Dear @Wecan-Huang0602 thank you so much for this work. Please see my longest comment above that require the biggest part of the work. If you think you can achieve this,...