NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

[automodel][draft] Integrate Megatron Custom FSDP2 into NeMo Automodel.

Open cspades opened this issue 8 months ago • 0 comments

Summary

Status: DRAFT - Tentatively pulling in CFSDP2 source code from a Megatron branch for Automodel specifically until the NeMo-Megatron path works again.

  • Integrates custom FSDP2 into NeMo Automodel in close collaboration with @shjwudp.
    • Torch-Native Automodel Support for CFSDP2 in Megatron: https://gitlab-master.nvidia.com/ADLR/megatron-lm/-/merge_requests/3150 (Working Branch: jianbinc/custom_fsdp_dtensor_ckpt) needs to be merged before we merge this PR!

TODO

  • Test TP and CP with CFSDP2 in Automodel after implementing support for DTensor buffering with CFSDP2.

Collection: nemo.lightning.pytorch.strategies.fsdp2_strategy

Changelog

  • Added options and utilities to wrap the Automodel in FSDP which shards and communicates optimizer state, gradients, and model parameters using dynamically allocated tensors.

Usage

  • To use CFSDP2, use the --cfsdp2 argument and populate --cfsdp2-unit-modules with the string class-paths of all layers that should be managed by CFSDP2, e.g. --cfsdp2-unit-modules transformers.models.llama.modeling_llama.LlamaDecoderLayer.
torchrun --nproc-per-node 8 examples/llm/sft/automodel.py --strategy fsdp2 --num-nodes 1 --devices 8 --dp-size 8 --cp-size 1 --global-batch-size 32 --micro-batch-size 1 --accumulate_grad_batches 4 --lr 3e-6 --seq-length 8192 --max-steps 10000 --log-every-n-steps 1 --limit-val-batches 0.025 --trust-remote-code --attn-implementation flash_attention_2 --use-chunked-ce --cfsdp2 --cfsdp2-unit-modules transformers.models.llama.modeling_llama.LlamaDecoderLayer

cspades avatar Apr 25 '25 00:04 cspades