LMFlow icon indicating copy to clipboard operation
LMFlow copied to clipboard

[BUG] deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer

Open LUMO666 opened this issue 1 year ago • 2 comments

Run /scripts/run_raft_align.sh in docker and get an error.

deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a client provided optimizer (<class 'transformers.optimization.AdamW'

) which in most cases will yield poor performance. Please either use deepspeed.ops.adam.DeepSpeedCPUAdam or set an optimizer in your ds-config (https://www.deepspeed.ai/docs/config-json/#optimizer-parameters). If you really want to use a custom optimizer w. ZeRO-Offload and understand the performance impacts you can also set <"zero_force_ds_cpu_optimizer": false> in your configuration file.

Is it related to mpi4py? I'm doubting whether I have mpi4py installed correctly. Thanks.

LUMO666 avatar Jan 18 '24 09:01 LUMO666

@WeiXiongUST @hendrydong I am wondering if you could take a look? Thanks 🙏

research4pan avatar Jan 19 '24 01:01 research4pan

Hi, it looks that the configuations of "ZeRO-Offload" is not correct, you may double check the yaml file.

BTW, this might be more related to the configuration of deepspeed.

hendrydong avatar Jan 20 '24 04:01 hendrydong