torchchat
torchchat copied to clipboard
Explore expanding dynamic quantization kernels (broaden a8w4dq support)
Add support for 1 - asymmetric a8w4dq, basically require to subtract zero from each value before multiplying, so should add a single multiply. This will help accelerate and better handle GGUF files on executorch and read Q4_0 for export to mobile. 2 - a8w4dq on desktop 3 - the asymmetric version of (1) on desktop