Accelerate x Trainer issue tracker:
A bunch of issues are a bit stale, and @SunMarc + @muellerzr are a bit short on bandwidth! Thus we would love to have community support to solve the following:
Help needed
- [ ] #27830
- [ ] #30911
- [ ] #30702
- [ ] #30239
- [ ] #29348
- [ ] #33157
- [ ] #28469
- [ ] #30663
- [ ] #30811
- [ ] #30819
- [ ] #31313
- [ ] #31457
- [ ] #30859
- [ ] #31897
- [ ] #30340
- [ ] #31892
- [x] #28914
- [x] #29518
- [ ] #30277
- [ ] #33376
Feature request
- [ ] #30725
Replied with potential fix and following
- [ ] #31734 followed by @irislin1006
- [ ] #32312 followed by @irislin1006
- [x] #31818 followed by @mekkcyber
- [x] #31439 followed by @nnilayy
- [ ] #28124 followed by @muellerz and @WizKnight
- [ ] #30767 followed by @SunMarc
- [ ] #33147 followed by @SunMarc
- [x] #33400 followed by @SunMarc
- [ ] #26413 followed by @muupan, @muellerzr and @SunMarc
- [x] #28808 followed by @Ben-Schneider-code
- [x] #31357 followed by @mekkcyber
- [x] #33733 resolved by the author
- [ ] #30330 followed by @SunMarc
- [ ] #30913 followed by @mekkcyber
- [ ] #27487 followed by @SunMarc
- [ ] #33336 followed by @MekkCyber
- [ ] #25695 followed by @MekkCyber
- [ ] #31278 followed by @muellerzr
- [ ] #32427 followed by @muellerzr and @SunMarc
- [ ] #31034 followed by @muellerzr
- [ ] #30822 followed by @muellerzr
- [ ] #31867 followed by @Ben-Schneider-code and @SunMarc
@ArthurZucker Hey there!👋 I'm new to this repository and excited to learn and contribute. Please let me know if there are any good starting points or tasks where I can be of assistance.
Any of these issue that have the Good First Issues should be fairly easy! 🤗
Hi @ArthurZucker, I'm a first time contributor, but I would love to take issue https://github.com/huggingface/transformers/issues/31734 as a start 👍
[Update on 202409/07] Handled and replied in the issue
Hi there👋 @ArthurZucker, Handled issue #31439, hope that helps🤗.
Hi there👋 @ArthurZucker, I'll handle the issue #28124
Hi there👋 @ArthurZucker, I would like to take https://github.com/huggingface/transformers/issues/32312 😀
[Update on 202409/09] Handled and replied in the issue
cc @matthewdouglas
I had opened PR #31268 as a fix for issue #30819. I think some discussion is needed on there @amyeroberts
Hey @amyeroberts, just wanted to check in on issue #28124. It seems like @muellerzr already tackled it with his fix in #30169. Should I still work on this further, or is it good to go as is?
Thanks!
Hi @WizKnight - best to ask @muellerzr (ideally on the relevant PR / issue to avoid pinging everyone here) on the status of those. I can see in #30169 the PR wasn't merged in due to inactivity -- pending a response to these questions..
In general, if something has just been closed by the github stale bot and not because of a clear decision not to pursue the PR / a clear rejection from the review process you're free to pick up the work :)
cc @mekkcyber
Hey @SunMarc and @muellerzr,
I'd love to contribute to this project and help resolve some of the issues mentioned here, especially the DeepSpeed Zero3-related bugs. I’ve already gone through some of the issues and identified potential starting points for solutions. I'll be focusing on these:
Training hangs at the first gradient syncing of an MoE model while using DeepSpeed (#30911) Trainer doesn't save evaluation metrics (#33733) CUDA RuntimeError: Unspecified Launch Failure during Training (#30913) I'll submit PRs with proposed fixes and updates soon. Thank you for the opportunity to contribute!
Also, if there are any specific guidelines or areas where help is most needed, feel free to point me in the right direction!
Looking forward to collaborating on this during Hacktoberfest 🎉
Hey @SunMarc and @muellerzr I would be happy to contribute the issue Trainer doesn't save evaluation metrics (#33733 )
Awesome! Just added the tag to make sure it works for everyone! 🥳
i want to work on #29348 please assign this to me
hey @ArthurZucker Will this be counted in hacktoberfest?
Yes, given that there is the tag! We don't assign issue, first PR that is up will be reviewed, if stale anyone can take it, if no PR is linked, you can also create one 🤗
Hi, I don't think https://github.com/huggingface/transformers/issues/28469 has been fixed yet. Facing this even in 4.46.3.
I have a hacky workaround here: custom_trainer It works on the repro-script, but might not cover all cases.
I don't think https://github.com/huggingface/transformers/issues/28469 has been fixed yet. Facing this even in 4.46.3.
Just reopened ! If you have a fix, would you like to open a PR so that we can have a look ? Thanks !
Hi, I'm new here and excited to learn and contribute.
hey are there any issues left to be assigned, if yes can you assign it to me?
Hi @ArthurZucker , is there anything remaining I can contribute here for hacktoberfest, happy to be a part such great repo. Thanks :)
This issue can be closed as we didn't update it for a long time. Please check the other issues in the repo !