transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Accelerate x Trainer issue tracker:

Open ArthurZucker opened this issue 1 year ago • 16 comments

A bunch of issues are a bit stale, and @SunMarc + @muellerzr are a bit short on bandwidth! Thus we would love to have community support to solve the following:

Help needed

  • [ ] #27830
  • [ ] #30911
  • [ ] #30702
  • [ ] #30239
  • [ ] #29348
  • [ ] #33157
  • [ ] #28469
  • [ ] #30663
  • [ ] #30811
  • [ ] #30819
  • [ ] #31313
  • [ ] #31457
  • [ ] #30859
  • [ ] #31897
  • [ ] #30340
  • [ ] #31892
  • [x] #28914
  • [x] #29518
  • [ ] #30277
  • [ ] #33376

Feature request

  • [ ] #30725

Replied with potential fix and following

  • [ ] #31734 followed by @irislin1006
  • [ ] #32312 followed by @irislin1006
  • [x] #31818 followed by @mekkcyber
  • [x] #31439 followed by @nnilayy
  • [ ] #28124 followed by @muellerz and @WizKnight
  • [ ] #30767 followed by @SunMarc
  • [ ] #33147 followed by @SunMarc
  • [x] #33400 followed by @SunMarc
  • [ ] #26413 followed by @muupan, @muellerzr and @SunMarc
  • [x] #28808 followed by @Ben-Schneider-code
  • [x] #31357 followed by @mekkcyber
  • [x] #33733 resolved by the author
  • [ ] #30330 followed by @SunMarc
  • [ ] #30913 followed by @mekkcyber
  • [ ] #27487 followed by @SunMarc
  • [ ] #33336 followed by @MekkCyber
  • [ ] #25695 followed by @MekkCyber
  • [ ] #31278 followed by @muellerzr
  • [ ] #32427 followed by @muellerzr and @SunMarc
  • [ ] #31034 followed by @muellerzr
  • [ ] #30822 followed by @muellerzr
  • [ ] #31867 followed by @Ben-Schneider-code and @SunMarc

ArthurZucker avatar Sep 06 '24 10:09 ArthurZucker

@ArthurZucker Hey there!👋 I'm new to this repository and excited to learn and contribute. Please let me know if there are any good starting points or tasks where I can be of assistance.

WizKnight avatar Sep 06 '24 11:09 WizKnight

Any of these issue that have the Good First Issues should be fairly easy! 🤗

ArthurZucker avatar Sep 06 '24 12:09 ArthurZucker

Hi @ArthurZucker, I'm a first time contributor, but I would love to take issue https://github.com/huggingface/transformers/issues/31734 as a start 👍

[Update on 202409/07] Handled and replied in the issue

irislin1006 avatar Sep 06 '24 15:09 irislin1006

Hi there👋 @ArthurZucker, Handled issue #31439, hope that helps🤗.

nnilayy avatar Sep 06 '24 22:09 nnilayy

Hi there👋 @ArthurZucker, I'll handle the issue #28124

WizKnight avatar Sep 07 '24 09:09 WizKnight

Hi there👋 @ArthurZucker, I would like to take https://github.com/huggingface/transformers/issues/32312 😀

[Update on 202409/09] Handled and replied in the issue

irislin1006 avatar Sep 08 '24 02:09 irislin1006

cc @matthewdouglas

SunMarc avatar Sep 10 '24 15:09 SunMarc

I had opened PR #31268 as a fix for issue #30819. I think some discussion is needed on there @amyeroberts

godspeed5 avatar Sep 13 '24 19:09 godspeed5

Hey @amyeroberts, just wanted to check in on issue #28124. It seems like @muellerzr already tackled it with his fix in #30169. Should I still work on this further, or is it good to go as is?

Thanks!

WizKnight avatar Sep 16 '24 13:09 WizKnight

Hi @WizKnight - best to ask @muellerzr (ideally on the relevant PR / issue to avoid pinging everyone here) on the status of those. I can see in #30169 the PR wasn't merged in due to inactivity -- pending a response to these questions..

In general, if something has just been closed by the github stale bot and not because of a clear decision not to pursue the PR / a clear rejection from the review process you're free to pick up the work :)

amyeroberts avatar Sep 16 '24 18:09 amyeroberts

cc @mekkcyber

SunMarc avatar Sep 27 '24 16:09 SunMarc

Hey @SunMarc and @muellerzr,

I'd love to contribute to this project and help resolve some of the issues mentioned here, especially the DeepSpeed Zero3-related bugs. I’ve already gone through some of the issues and identified potential starting points for solutions. I'll be focusing on these:

Training hangs at the first gradient syncing of an MoE model while using DeepSpeed (#30911) Trainer doesn't save evaluation metrics (#33733) CUDA RuntimeError: Unspecified Launch Failure during Training (#30913) I'll submit PRs with proposed fixes and updates soon. Thank you for the opportunity to contribute!

Also, if there are any specific guidelines or areas where help is most needed, feel free to point me in the right direction!

Looking forward to collaborating on this during Hacktoberfest 🎉

P-Potdar avatar Oct 01 '24 19:10 P-Potdar

Hey @SunMarc and @muellerzr I would be happy to contribute the issue Trainer doesn't save evaluation metrics (#33733 )

b423016 avatar Oct 02 '24 16:10 b423016

Awesome! Just added the tag to make sure it works for everyone! 🥳

ArthurZucker avatar Oct 03 '24 15:10 ArthurZucker

i want to work on #29348 please assign this to me

Thejaggeddevil avatar Oct 11 '24 08:10 Thejaggeddevil

hey @ArthurZucker Will this be counted in hacktoberfest?

eeshan15 avatar Oct 11 '24 19:10 eeshan15

Yes, given that there is the tag! We don't assign issue, first PR that is up will be reviewed, if stale anyone can take it, if no PR is linked, you can also create one 🤗

ArthurZucker avatar Oct 22 '24 13:10 ArthurZucker

Hi, I don't think https://github.com/huggingface/transformers/issues/28469 has been fixed yet. Facing this even in 4.46.3.

I have a hacky workaround here: custom_trainer It works on the repro-script, but might not cover all cases.

naba89 avatar Nov 30 '24 11:11 naba89

I don't think https://github.com/huggingface/transformers/issues/28469 has been fixed yet. Facing this even in 4.46.3.

Just reopened ! If you have a fix, would you like to open a PR so that we can have a look ? Thanks !

SunMarc avatar Dec 02 '24 15:12 SunMarc

Hi, I'm new here and excited to learn and contribute.

Arbaaz123676 avatar Oct 03 '25 13:10 Arbaaz123676

hey are there any issues left to be assigned, if yes can you assign it to me?

shivigoyal4321 avatar Oct 11 '25 11:10 shivigoyal4321

Hi @ArthurZucker , is there anything remaining I can contribute here for hacktoberfest, happy to be a part such great repo. Thanks :)

sonianuj287 avatar Oct 13 '25 10:10 sonianuj287

This issue can be closed as we didn't update it for a long time. Please check the other issues in the repo !

SunMarc avatar Oct 13 '25 14:10 SunMarc