tract
tract copied to clipboard
Does tract support quantized onnx models?
Just curious if anyone tried it with tract.
"Quantized models" is a very overloaded terminology :) tract has some support for the QMatMul and QConv operators in ONNX, but ONNX lagged behind the SOTA for a lot of time with regards to model quantization and compression (which can maybe be explained by a focus on the training side affairs). For instance, last time I checked, there was no Q8-like type (with scale and offset) support in regular arithmetic operations in tract (Add, Mul, ...) making it a difficult format to manage quantized models. And with the LLM boom, it feels like the community is moving to more bespoke formats like GGML...