axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

Add example for deepspeed config with cpu offloading.

Open PhilipMay opened this issue 1 year ago • 4 comments

⚠️ Please check that this feature request hasn't been suggested before.

  • [X] I searched previous Ideas in Discussions didn't find any similar feature requests.
  • [X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

I would like to use cpu offloading. There is no example config for deepspeed.

✔️ Solution

Provide a config.

Maybe this - but I am not 100% sure:

"zero_optimization": {
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
    },
    "offload_param": {
        "device": "cpu",
        "pin_memory": true
    },
    "stage": 3,
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size":  1e9,
    "reduce_bucket_size": "auto",
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": "auto",
    "stage3_max_live_parameters": 1e9,
    "stage3_max_reuse_distance": 1e9,
    "stage3_gather_16bit_weights_on_model_save": true
  },
  ...

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this feature has not been requested yet.
  • [X] I have provided enough information for the maintainers to understand and evaluate this request.

PhilipMay avatar Jan 26 '24 16:01 PhilipMay

The example deepspeed configs are in the deepspeed_configs folder.

noobmaster29 avatar Jan 28 '24 01:01 noobmaster29

The example deepspeed configs are in the deepspeed_configs folder.

@noobmaster29 But they do not offer an CPU offloading config.

PhilipMay avatar Jan 28 '24 11:01 PhilipMay

zero3 used to have offload, but was removed. If you want to make a PR, you could re-add the old revision for zero3 but with a slightly different name like zero3_cpuoffload

NanoCode012 avatar Jan 31 '24 14:01 NanoCode012

Added a PR #1466 for this.

NanoCode012 avatar Mar 30 '24 19:03 NanoCode012