Zach Mueller comments

Results 368 comments of


                                            Zach Mueller

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

Can you provide a bit more logs? Earlier in the trace I see > The model is not runnable with DeepSpeed with error

[2023-12-04 11:52:08,378] [INFO] [autotuner.py:1110:run_after_tuning] No optimal DeepSpeed configuration found by autotuning.

Is there any other trace? This is the chunk I'm talking about. It looks cut off: ```bash HERE -> [2023-12-04 11:52:08,378] [ERROR] [autotuner.py:699:model_info_profile_run] The model is not runnable with DeepSpeed...

storing & logging gradient norm in trainer

Gentle ping @shijie-wu :)

storing & logging gradient norm in trainer

cc @amyeroberts for final review :)

storing & logging gradient norm in trainer

Can you all try installing with `pip install git+https://github.com/huggingface/transformers@muellerzr-deepspeed-item`? This PR may have fixed this too as well: https://github.com/huggingface/transformers/pull/29444

Shared tensors not correctly saved.

Working on trying to reproduce this :)

Shared tensors not correctly saved.

Successfully could reproduce, a minimal repr is below: ```python import torch from accelerate import Accelerator from accelerate.utils import DeepSpeedPlugin, HfDeepSpeedConfig from transformers import AutoModelForCausalLM from transformers.modeling_utils import unwrap_model transformers_config =...