DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

About DEFAULT_MIN_MEM_CONFIG in autoning

Open MrZhengXin opened this issue 2 years ago • 5 comments

https://github.com/microsoft/DeepSpeed/blob/b361c72761d97f5a1714a3e91d1f7c36fd3cfdd8/deepspeed/autotuning/constants.py#L142-L148 That doesn't include parameter and optimizer offload, which is memory minimal.

        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },

MrZhengXin avatar May 07 '23 05:05 MrZhengXin

Hi @MrZhengXin - could you elaborate more on the question?

loadams avatar May 09 '23 17:05 loadams

Hi @MrZhengXin - could you elaborate more on the question?

Sure. Suppose you want to auto-tune a 30B model, the program would try to run at the minimum memory setting first, but the current minimum memory setting does not include offload, so the 30B model would not successfully run the auto-tuning.

MrZhengXin avatar May 10 '23 02:05 MrZhengXin

Thanks, @cli99 is probably most well suited to answer this.

loadams avatar May 11 '23 16:05 loadams

Hi @MrZhengXin, the autotuning feature currently does not support offloading. We plan to include it in the near future.

Thanks.

cli99 avatar May 11 '23 17:05 cli99

Hi @MrZhengXin, the autotuning feature currently does not support offloading. We plan to include it in the near future.

Thanks.

Hi~ Thanks for the response! By the way, I thought that even if offloading is not supported, maybe you could just specify the frozen offload parameter?

Offloading and NVME Currently, the DeepSpeed Autotuner does not tune offloading behaviors but instead uses the values defined in the offload section of the DeepSpeed configuration file.

MrZhengXin avatar May 12 '23 01:05 MrZhengXin

@MrZhengXin , sorry for the late response, was OOF for a while. The autotuner shall take the offload configuration as it is. If it does not behave this way, can you shall the example code you are running with so we can reproduce the error on our end? thanks!

cli99 avatar Jun 21 '23 18:06 cli99

Closing for now, if you are still seeing this issue or able to share the example code, please re-open and we would be happy to take a look.

loadams avatar Jul 24 '23 17:07 loadams