Max Idahl comments

Results 6 comments of


                                            Max Idahl

RuntimeError: Error(s) in loading state_dict for MistralForCausalLM (Deepspeed Zero 3)

> I can confirm that I only experience this issue when using Zero3, and Zero 2 works fine. I just ran into the same error, can confirm switching from zero3...

RuntimeError: Error(s) in loading state_dict for MistralForCausalLM (Deepspeed Zero 3)

> > I can confirm the same error when finetuning Mistral with chatml format and deepspeed3. > > ``` > > loading model > > Traceback (most recent call last):...

Vicuna 13B forward method is very slow in FSDP mode.

Here is a working example you can try: ```python from functools import partial import torch from torch.distributed.fsdp import FullyShardedDataParallel as FSDP from torch.distributed.fsdp.wrap import transformer_auto_wrap_policy from transformers import LlamaTokenizer, LlamaForCausalLM...

Model Parallelism and accelerate's usage of DDP aren't compatible

Just to document my experience on getting DDP + MP (2x2 on 4 gpus) to work with Accelerate (via HF trainer): I modified the current main branch to initialize the...

Model Parallelism and accelerate's usage of DDP aren't compatible

> @maxidl can you share your modified code? Curious what those exceptions are that exist for "no good reason" @muellerzr I do think these error are necessary if one does...

Model Parallelism and accelerate's usage of DDP aren't compatible

Sure, that sounds great. Once the changes are in (no rush with that), I might create a tutorial-style GitHub repo for it and do some benchmarking, to be shared via...