verl icon indicating copy to clipboard operation
verl copied to clipboard

[WIP][FSDP,Training] feat: Support FSDP2 FP8 training

Open eternally-z opened this issue 1 month ago • 1 comments

What does this PR do?

This PR enables FP8 training support using FSDP2 (integrated with torchao) on the latest codebase.

This work is based on the previous attempt in PR https://github.com/volcengine/verl/pull/1490 by @ horsebridge. Since the architecture of verl has evolved significantly, the original PR had conflicts with the current main branch. This PR ports the implementation to align with the latest architecture and re-enables the FP8 capability.

TODO List:

  • [ ] We are currently conducting experiments on FSDP2 FP8 training combined with FP8 rollout (based on SGLang). The experimental results and verification details will be updated here once available.

Checklist Before Starting

  • [ ] Search for similar PRs. Paste at least one query link here: ...
  • [ ] Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

actor_rollout_ref:
  actor:
    # Note: FP8 training currently requires fsdp2 strategy
    strategy: fsdp2
    fsdp_config:
      fp8: True

critic:
  # Note: FP8 training currently requires fsdp2 strategy
  strategy: fsdp2
  model:
    fsdp_config:
      fp8: True

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

[!IMPORTANT] Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

eternally-z avatar Nov 27 '25 12:11 eternally-z

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Nov 27 '25 12:11 CLAassistant