DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[REQUEST] Dynamic model offload support ZeRO-3 inference models

Open kfertakis opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Please describe. The issue is related to #5620 and #6011. When having a deespeed model initialised for ZeRO-3 inference, with a DeepSpeedZeRoOffload optimizer for example, the model cannot be moved to the CPU either by using the torch.nn.module.to() functionality or with the new offload_states API.

Describe the solution you'd like Either extend #6011 to support offload of a model configured for ZeRO-3 inference or a new API that supports this.

Thanks

kfertakis avatar Oct 01 '24 14:10 kfertakis

@kfertakis, can you please clarify you ask here since:

  1. ZeRO-Inference does not include optimizer state
  2. ZeRO-Inference hosts model weights in CPU or NVMe normally.

tjruwase avatar Oct 01 '24 14:10 tjruwase

It might be helpful to use example log/screenshots from the following to demonstrate the problem: https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/zero_inference/README.md

Thanks!

tjruwase avatar Oct 01 '24 14:10 tjruwase

@tjruwase, thanks for the example reference. You're right I should clarify a bit better. The issue does not refer to optimizer states rather weights for a ZeRO-Inference model initially placed in GPU memory.

Indeed, if you configure ZeRO-Inference to use CPU to host model weights at the initialisation time, as with --cpu-offload option in the example code, GPU memory will not be used. However, the issue I am referring to is when the model is initially placed into GPU memory (no --cpu-offload flag in the example) and then there is a need to dynamically move it at runtime to CPU memory, the same that the offload_states (#6011) API accomplishes, disregarding the optimizer state that is not relevant in this case. Using the torch.nn.module.to() or offload_states functionality does not move the deepspeed initialised ZeRO-Inference model to CPU memory.

Thanks.

kfertakis avatar Oct 01 '24 15:10 kfertakis