ao icon indicating copy to clipboard operation
ao copied to clipboard

[Quantization + FSDP] Support `quantize_()` for DTensor

Open gau-nernst opened this issue 1 year ago • 6 comments

While trying out INT8 mixed precision pretraining (#748) with torchtitan, I came across an issue that if the model is FSDP-sharded, quantize_() won't work. The fix would be adding an extra logic to handle DTensor, similar to what FP8 is doing

https://github.com/pytorch/ao/blob/f5703b07acc683653556d04ef970709ba47dba10/torchao/float8/float8_tensor.py#L161-L183

gau-nernst avatar Sep 04 '24 02:09 gau-nernst

Yeah this came up in some discussions with inference providers like SGLang as well

msaroufim avatar Sep 04 '24 05:09 msaroufim

@jerryzh168 @kwen2501 is this addressed now with quantize + distributed inference composability work?

supriyar avatar Jan 09 '25 05:01 supriyar

this is not addressed yet, this is training use case I think, that we can explore in 2025 H1 together with @vkuzo, we do need a guide on how DTensor composes with quantization in both inference and training use cases

jerryzh168 avatar Jan 09 '25 06:01 jerryzh168

@jerryzh168 is there any update for this issue? Cheers!

Andy0422 avatar Apr 29 '25 02:04 Andy0422

see some examples in https://github.com/pytorch/ao/blob/main/test/float8/test_fsdp.py

we'll be using quantize_ API everywhere, but maybe not yet for https://github.com/pytorch/ao/blob/137b0795acb3282ce622948b1537e20914186eea/test/float8/test_fsdp.py#L88, cc @vkuzo @danielvegamyhre on the plan to move to quantize_ API there

what is your use case @Andy0422

jerryzh168 avatar Apr 29 '25 18:04 jerryzh168

see some examples in https://github.com/pytorch/ao/blob/main/test/float8/test_fsdp.py

we'll be using quantize_ API everywhere, but maybe not yet for

ao/test/float8/test_fsdp.py

Line 88 in 137b079

convert_to_float8_training( , cc @vkuzo @danielvegamyhre on the plan to move to quantize_ API there what is your use case @Andy0422

My use case is inference. I'd like to use quantize_API, such as Int8DynamicActivationInt8WeightConfig, to quantize the weight first, then move the weight and scale with FSDP, hopefully use int8 allreduce.

Andy0422 avatar Apr 30 '25 01:04 Andy0422