Zach Mueller
Zach Mueller
Can you provide a bit more logs? Earlier in the trace I see > The model is not runnable with DeepSpeed with error
Is there any other trace? This is the chunk I'm talking about. It looks cut off: ```bash HERE -> [2023-12-04 11:52:08,378] [ERROR] [autotuner.py:699:model_info_profile_run] The model is not runnable with DeepSpeed...
Gentle ping @shijie-wu :)
cc @amyeroberts for final review :)
Can you all try installing with `pip install git+https://github.com/huggingface/transformers@muellerzr-deepspeed-item`? This PR may have fixed this too as well: https://github.com/huggingface/transformers/pull/29444
Working on trying to reproduce this :)
Successfully could reproduce, a minimal repr is below: ```python import torch from accelerate import Accelerator from accelerate.utils import DeepSpeedPlugin, HfDeepSpeedConfig from transformers import AutoModelForCausalLM from transformers.modeling_utils import unwrap_model transformers_config =...
@Luodian can you try again when `num_processes >1`? I couldn't reproduce it. I can only reproduce your main example here because currently Accelerate doesn't really support single-GPU deepspeed
Hi all, please explain more about how you're using `Accelerator.save_state()` here please? We don't expose that part of the API in the `Trainer`, so *how* that is being called could...
@mzamini92 could you rebase so we can double check no tests are breaking with this and we can merge? Thanks!