Max Idahl
Max Idahl
> I can confirm that I only experience this issue when using Zero3, and Zero 2 works fine. I just ran into the same error, can confirm switching from zero3...
> > I can confirm the same error when finetuning Mistral with chatml format and deepspeed3. > > ``` > > loading model > > Traceback (most recent call last):...
Here is a working example you can try: ```python from functools import partial import torch from torch.distributed.fsdp import FullyShardedDataParallel as FSDP from torch.distributed.fsdp.wrap import transformer_auto_wrap_policy from transformers import LlamaTokenizer, LlamaForCausalLM...
Just to document my experience on getting DDP + MP (2x2 on 4 gpus) to work with Accelerate (via HF trainer): I modified the current main branch to initialize the...
> @maxidl can you share your modified code? Curious what those exceptions are that exist for "no good reason" @muellerzr I do think these error are necessary if one does...
Sure, that sounds great. Once the changes are in (no rush with that), I might create a tutorial-style GitHub repo for it and do some benchmarking, to be shared via...