DeepSpeed
DeepSpeed copied to clipboard
Dont_change_device for parameters in initialization
When I was running model training by Zero offload, to save the GPU memory I make the model weights initialized on CPU memory too by setting up deepspeed.zero.Init(remote_device="cpu", dtype=torch.half, enabled=False). Although the model weight is really initialized on CPU memory, but after deepspeed.initialzed(), the model still move to GPU memory. So I am wondering
- in Zero offload (stage 3, offload to cpu/nvme) is it possible the weight of the model stay mainly on CPU memory/Nvme but only is loaded layer by layer to GPU memory?
- I found in
engine.py(actually it is called bydeepspeed.initialize()) there is an argumentdont_change_device(link)[https://github.com/microsoft/DeepSpeed/blob/4ae3a3da0dfd19d7ab7a76e7c742ac12f44fc1c0/deepspeed/runtime/engine.py#L1138-L1139] which controls whether or not the model weights is moved to GPU memory. But I also found no place to calldont_change_device. So my question is how to usedont_change_deviceand is it used to retain the model weight on CPU memory?
@larry-fuy, to enable zero stage 3 offloading to cpu/nvme, enabled must be True in deepspeed.zero.Init(). Please see this tutorial for using this feature (a.k.a., zero-infinity). Here are answers to your specific questions:
- Streaming layer weights into GPU from CPU/NVMe, on-demand, as you have described is one of the features of zero-infinity. You can configure the
"offload_params"in the ds_config to control this behavior. - You should not need to manipulate
dont_change_device. We can revisit this if the above suggestions don't work for you.
Thanks!
I'd like to set this option "dont_change_device" because deepspeed is calling the module.to() which raises an error in transformers with my 4-bits model:
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the
model as it is, since the model has already been set to the correct devices and
casted to the correct `dtype`.
I'd like to set this option "dont_change_device" because deepspeed is calling the module.to() which raises an error in transformers with my 4-bits model:
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
Same issue for BitsAndBytes Llama3.1 70B Instruct 4bit model