ao icon indicating copy to clipboard operation
ao copied to clipboard

[Tracker] WIP features for torchao 0.5

Open supriyar opened this issue 6 months ago • 0 comments

Branch cut: Sept 5th Release: Sept 10th

Low-precision inference

  • [x] Make IntX dtypes work with AffineQuantizedTensor @jerryzh168 (https://github.com/pytorch/ao/pull/672)
  • [ ] Performance benchmarks for intX dtypes on relevant shapes, identify potential improvements to torch.compile @HDCharles @jerryzh168
  • [x] Make quant_llm (quant/dequant/pack/unpack) available through AffineQuantizedTensor @jerryzh168 (https://github.com/pytorch/ao/pull/772)
  • [x] QAT Tensor subclass based implementation https://github.com/pytorch/ao/pull/585 @andrewor14
  • [x] TBD new feature enablement on quantized training/QAT/quantized fine-tuning @andrewor14 @gau-nernst
  • [x] KV Cache quantization for llama3.1 w/ long context @HDCharles
  • [ ] Float8 inference flow @jainapurva @drisspg
  • [x] Float8 gemm benchmarks on relevant shapes @vkuzo @jainapurva

Low-precision training

  • [x] Float8 training recipes @vkuzo
  • [ ] Make AQT trainable, i.e. compose with DTensor and FSDP2 @andrewor14 @jerryzh168

Sparsity

  • [x] Hopper + FP8 support in core @jcaip
  • [x] 2:4 sparsity + BSR benchmarks for ViT model @jcaip
  • [ ] int8 BSR support

Benchmarking

  • [ ] setup torchao benchmarking in torchbench @HDCharles

Blog

  • [x] README revamp @msaroufim
  • [ ] torchao hard launch blogpost @msaroufim
  • [x] Low-bit quanti w/ HQQ @HDCharles @mobicham

supriyar avatar Aug 13 '24 17:08 supriyar