Atchuth Naveen Ch
Atchuth Naveen Ch
Thank you @supriyar for pointing me to the gemlite kernels and thanks to @mobicham for the work on gemlite. I am able to optimize my model using both torchao int4...
My bad, I have only evaluated on a batch size of 1. With greater batch sizes, I see the performance gain from using gemlite kernels.
Hi, I ran the script at https://github.com/mobiusml/hqq/blob/master/examples/hqq_lib_demo.py to cross check for batch-size=1. It seems the `torchao` backend is running faster at 156 tok/s vs gemlite backend at 116 tok/s. What...