tract icon indicating copy to clipboard operation
tract copied to clipboard

Does tract support quantized onnx models?

Open ARKEYTECT opened this issue 1 year ago • 1 comments

Just curious if anyone tried it with tract.

ARKEYTECT avatar Jan 08 '25 19:01 ARKEYTECT

"Quantized models" is a very overloaded terminology :) tract has some support for the QMatMul and QConv operators in ONNX, but ONNX lagged behind the SOTA for a lot of time with regards to model quantization and compression (which can maybe be explained by a focus on the training side affairs). For instance, last time I checked, there was no Q8-like type (with scale and offset) support in regular arithmetic operations in tract (Add, Mul, ...) making it a difficult format to manage quantized models. And with the LLM boom, it feels like the community is moving to more bespoke formats like GGML...

kali avatar Jan 20 '25 08:01 kali