Casper
Casper
Hi @abhinavkulkarni, I would love to support more models - especially embedding models. If you are fortunate enough to have time to add support for these models, I would highly...
Which transformers version are you using? Could you try `4.34.1` and `4.35.2`? A little background... recently the `4.36` version broke a lot of things around how we cache arguments in...
It seems the implementation broke a while ago. Unfortunately, I do not currently have the capacity to research old models that break with new updates. I will welcome all PRs...
Hi @zhewang1-intc, thank you for your interest. It would be incredibly exciting to make a CPU-compatible kernel available for AutoAWQ. We already have a CPU-compatible [approach (dequantizing + torch matmul)](https://github.com/casper-hansen/AutoAWQ/blob/main/awq/utils/packing_utils.py#L83),...
I believe it could be possible and would be open to PRs implementing this work. Do note that GPTQ without activation reordering is vastly inferior in accuracy, so this would...
#215 should resolve this. I need to test it more to make sure it’s correct. Can you drop an example?
Hi @michaelfeil, great work on this! I am indeed interested in having support for BERT models. However, the main issues you highlighted were the same ones I ran into. Do...
Can you show me the code you used to see the difference?
> * Do I need to quantize all layers? I saw that all layers are replaced with GEMM, but I only quantized a few of them. (see the code) The...
Yeah, I am not sure the new PR was a good fix. I have to assess this further before the next release. Have you been able to identify where the...