verl
verl copied to clipboard
Quantization aware training (QAT) support
Feature request
The current codebase only supports bf16/fp16 training, while we typically apply quantization (int8, int4, fp8, fp4) during model serving to reduce VRAM usage while still maintaining accuracy. PyTorch supports Quantization-Aware Training (QAT) (https://pytorch.org/blog/quantization-aware-training/), and it would be great if Verl could support this as well to achieve better training results.
Motivation
the result of training model using verl is very good in bf16/fp16, but the result seems to degrade significantly after quantization (int4). And I think this feature can help fill the gap between bf16 vs int4 model in term of accuracy