ao [Tracker] WIP features for torchao 0.5

[Tracker] WIP features for torchao 0.5

Open supriyar opened this issue 6 months ago • 0 comments

Branch cut: Sept 5th Release: Sept 10th

[x] Make IntX dtypes work with AffineQuantizedTensor @jerryzh168 (https://github.com/pytorch/ao/pull/672)
[ ] Performance benchmarks for intX dtypes on relevant shapes, identify potential improvements to torch.compile @HDCharles @jerryzh168
[x] Make quant_llm (quant/dequant/pack/unpack) available through AffineQuantizedTensor @jerryzh168 (https://github.com/pytorch/ao/pull/772)
[x] QAT Tensor subclass based implementation https://github.com/pytorch/ao/pull/585 @andrewor14
[x] TBD new feature enablement on quantized training/QAT/quantized fine-tuning @andrewor14 @gau-nernst
[x] KV Cache quantization for llama3.1 w/ long context @HDCharles
[ ] Float8 inference flow @jainapurva @drisspg
[x] Float8 gemm benchmarks on relevant shapes @vkuzo @jainapurva

[x] Float8 training recipes @vkuzo
[ ] Make AQT trainable, i.e. compose with DTensor and FSDP2 @andrewor14 @jerryzh168

Aug 13 '24 17:08 supriyar