ao
ao copied to clipboard
[Tracker] WIP features for torchao 0.5
Branch cut: Sept 5th Release: Sept 10th
Low-precision inference
- [x] Make IntX dtypes work with AffineQuantizedTensor @jerryzh168 (https://github.com/pytorch/ao/pull/672)
- [ ] Performance benchmarks for intX dtypes on relevant shapes, identify potential improvements to torch.compile @HDCharles @jerryzh168
- [x] Make quant_llm (quant/dequant/pack/unpack) available through AffineQuantizedTensor @jerryzh168 (https://github.com/pytorch/ao/pull/772)
- [x] QAT Tensor subclass based implementation https://github.com/pytorch/ao/pull/585 @andrewor14
- [x] TBD new feature enablement on quantized training/QAT/quantized fine-tuning @andrewor14 @gau-nernst
- [x] KV Cache quantization for llama3.1 w/ long context @HDCharles
- [ ] Float8 inference flow @jainapurva @drisspg
- [x] Float8 gemm benchmarks on relevant shapes @vkuzo @jainapurva
Low-precision training
- [x] Float8 training recipes @vkuzo
- [ ] Make AQT trainable, i.e. compose with DTensor and FSDP2 @andrewor14 @jerryzh168
Sparsity
- [x] Hopper + FP8 support in core @jcaip
- [x] 2:4 sparsity + BSR benchmarks for ViT model @jcaip
- [ ] int8 BSR support
Benchmarking
- [ ] setup torchao benchmarking in torchbench @HDCharles
Blog
- [x] README revamp @msaroufim
- [ ] torchao hard launch blogpost @msaroufim
- [x] Low-bit quanti w/ HQQ @HDCharles @mobicham