Quantization aware training (QAT) support

Open nguyenhoangthuan99 opened this issue 1 month ago • 0 comments

Feature request

The current codebase only supports bf16/fp16 training, while we typically apply quantization (int8, int4, fp8, fp4) during model serving to reduce VRAM usage while still maintaining accuracy. PyTorch supports Quantization-Aware Training (QAT) (https://pytorch.org/blog/quantization-aware-training/), and it would be great if Verl could support this as well to achieve better training results.

Motivation

the result of training model using verl is very good in bf16/fp16, but the result seems to degrade significantly after quantization (int4). And I think this feature can help fill the gap between bf16 vs int4 model in term of accuracy

Nov 10 '25 05:11 nguyenhoangthuan99