Casper

Results 295 comments of Casper

Hi @abhinavkulkarni, I would love to support more models - especially embedding models. If you are fortunate enough to have time to add support for these models, I would highly...

Which transformers version are you using? Could you try `4.34.1` and `4.35.2`? A little background... recently the `4.36` version broke a lot of things around how we cache arguments in...

It seems the implementation broke a while ago. Unfortunately, I do not currently have the capacity to research old models that break with new updates. I will welcome all PRs...

Hi @zhewang1-intc, thank you for your interest. It would be incredibly exciting to make a CPU-compatible kernel available for AutoAWQ. We already have a CPU-compatible [approach (dequantizing + torch matmul)](https://github.com/casper-hansen/AutoAWQ/blob/main/awq/utils/packing_utils.py#L83),...

I believe it could be possible and would be open to PRs implementing this work. Do note that GPTQ without activation reordering is vastly inferior in accuracy, so this would...

#215 should resolve this. I need to test it more to make sure it’s correct. Can you drop an example?

Hi @michaelfeil, great work on this! I am indeed interested in having support for BERT models. However, the main issues you highlighted were the same ones I ran into. Do...

Can you show me the code you used to see the difference?

> * Do I need to quantize all layers? I saw that all layers are replaced with GEMM, but I only quantized a few of them. (see the code) The...

Yeah, I am not sure the new PR was a good fix. I have to assess this further before the next release. Have you been able to identify where the...