Zach Mueller

Results 368 comments of Zach Mueller

Can you provide a bit more logs? Earlier in the trace I see > The model is not runnable with DeepSpeed with error

Is there any other trace? This is the chunk I'm talking about. It looks cut off: ```bash HERE -> [2023-12-04 11:52:08,378] [ERROR] [autotuner.py:699:model_info_profile_run] The model is not runnable with DeepSpeed...

cc @amyeroberts for final review :)

Can you all try installing with `pip install git+https://github.com/huggingface/transformers@muellerzr-deepspeed-item`? This PR may have fixed this too as well: https://github.com/huggingface/transformers/pull/29444

Working on trying to reproduce this :)

Successfully could reproduce, a minimal repr is below: ```python import torch from accelerate import Accelerator from accelerate.utils import DeepSpeedPlugin, HfDeepSpeedConfig from transformers import AutoModelForCausalLM from transformers.modeling_utils import unwrap_model transformers_config =...

@Luodian can you try again when `num_processes >1`? I couldn't reproduce it. I can only reproduce your main example here because currently Accelerate doesn't really support single-GPU deepspeed

Hi all, please explain more about how you're using `Accelerator.save_state()` here please? We don't expose that part of the API in the `Trainer`, so *how* that is being called could...

@mzamini92 could you rebase so we can double check no tests are breaking with this and we can merge? Thanks!