dumpmemory

Results 51 comments of dumpmemory

> FSDP is already supported out of the box. Unsure if we need to support deepspeed? we need deepspeed for sure

> FSDP is already supported out of the box. Unsure if we need to support deepspeed? we need deepspeed for sure

> @dumpmemory Can you elaborate? What's your usecase? usually, we use deep speed's Zero 2 or 3 to train large model and for the small one , we also use...

how about this one https://github.com/microsoft/DeepSpeed/issues/2637 . It seems the only option is disable zero.init with accelerate.

> Actually @tohtana has just created a PR that is supposed to fix both issues: [microsoft/DeepSpeed#2989](https://github.com/microsoft/DeepSpeed/pull/2989) > > I will be able to try it probably tomorrow, but please go...

https://github.com/microsoft/DeepSpeed/issues/2637 still exists with https://github.com/microsoft/DeepSpeed/pull/2989 my setting is here https://github.com/huggingface/peft/issues/161

> Thank you for testing [microsoft/DeepSpeed#2989](https://github.com/microsoft/DeepSpeed/pull/2989), @dumpmemory - sorry to hear it didn't resolve the leak - perhaps file a new issue in DS, as the one I posted I...

I have the same issue when train mixtral 7bx8 with transformers 4.36 and deepseed 0.12.4(0.12.3) zero3 with gradient_checkpointing enable . it hangs after around 1:30 hours traning.

I have also try the tohtana/nested_zero_init branch, which did not fix it.