ao
ao copied to clipboard
AO dtype composability tracker
As we start onboarding more dtypes we ideally want them to work in as many different situations as possible so opening this tracker and will update the table as things change. If I should be adding more columns or rows or if there's any cells you disagree with please let me know!
The columns can also compose with each other but to be explicit
- training with FSDP2 should compose with low bit optimizers
- Inference quantization and KV cache quantization should compose
And sparsity IIUC only works with in8 inference quantization right now
Dtype | Training with FSDP2 | Inference | Optimizer | QAT | KV cache | Notes |
---|---|---|---|---|---|---|
Int8 | Experimental | Yes | Yes | LUT based | Yes | |
Int4 | No | Yes | Yes | LUT based | No | |
Fp8 | Yes | Yes | Yes | Not needed | No | |
NF4 | Yes | Experimental | No | In progress | No | Does not use quantize api |
fp6 | No | Yes | No | No | No | |
UintX/Fpx | In progress | Yes | No | No | No | Still requires more performance work |
MX: fp8/6/4 with scales | Emulation only | Emulation only | No | Not needed because we can compute in this dtype | No | Pending release of B100 gpus for acceleration |
Autoquant | N/A | Yes | N/A | N/A | N/A | Supports int8/4. Fp8 coming next |
TODO
- [ ] Seperate table where columns are weights, activation, optimizer and gradients
- [ ] Seperate table where techniques are rows and columns are devices