DeepSpeed
DeepSpeed copied to clipboard
[BUG] offloading section in config file never carried to autotuner
Describe the bug I tried to enable offloading in the zero2_auto.json file with the following lines,
"zero_optimization": {
"stage": 2,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
...
}
It works fine for normal runs without the --autotune flag. However, once I use deepspeed --autotune, none of the automatically generated .json files have the offload_optimizer section, i.e. one sample json file generated by autotuner
This contradicts what is stated in the README: "Currently, the DeepSpeed Autotuner does not tune offloading behaviors but instead uses the values defined in the offload section of the DeepSpeed configuration file." [https://github.com/microsoft/DeepSpeed/tree/master/deepspeed/autotuning#offloading-and-nvme]
To Reproduce
git clone https://github.com/cxxz/llama_deepspeed_autotune.git
cd llama_deepspeed_autotune
./run_autotune_llama_4A100.sh
Expected behavior
All generated ds_config.json during the search should have the offload section.
System info (please complete the following information):
- Python version: 3.10
- DeepSpeed version: 0.9.1
@cxxz , can you try the least transformers and accelerate library? I cannot reproduce the error on my end. It does include the offload section in my test. thanks
Thank you for responding to my request. I have done pip install git+https://github.com/huggingface/transformers and pip install git+https://github.com/huggingface/accelerate, as confirmed by pip_freeze. However, upon rerunning run_autotune_llama_4A100.sh, the offload section still failed to be transferred to the ds_config.json files in all attempts. The complete log has been documented in the repository. Any hint on what settings might have gone wrong?