Casper comments

Results 295 comments of


                                            Casper

[Feature] Support AWQ quantization for HF AutoModel (embedding models)

Hi @abhinavkulkarni, I would love to support more models - especially embedding models. If you are fortunate enough to have time to add support for these models, I would highly...

bloomz_7b1 error message TypeError: forward() missing 1 required positional argument: 'alibi'

Which transformers version are you using? Could you try `4.34.1` and `4.35.2`? A little background... recently the `4.36` version broke a lot of things around how we cache arguments in...

bloomz_7b1 error message TypeError: forward() missing 1 required positional argument: 'alibi'

It seems the implementation broke a while ago. Unfortunately, I do not currently have the capacity to research old models that break with new updates. I will welcome all PRs...

[RFC] options about low-bit GEMM kernels contribution on x86 CPUs

Hi @zhewang1-intc, thank you for your interest. It would be incredibly exciting to make a CPU-compatible kernel available for AutoAWQ. We already have a CPU-compatible [approach (dequantizing + torch matmul)](https://github.com/casper-hansen/AutoAWQ/blob/main/awq/utils/packing_utils.py#L83),...

GPTQ model weights conversion/interop

I believe it could be possible and would be open to PRs implementing this work. Do note that GPTQ without activation reordering is vastly inferior in accuracy, so this would...

Batching with fuse_layers = True leads to different outputs

#215 should resolve this. I need to test it more to make sure it’s correct. Can you drop an example?

Adding bert - WIP

Hi @michaelfeil, great work on this! I am indeed interested in having support for BERT models. However, the main issues you highlighted were the same ones I ran into. Do...

Batching with fuse_layers = True leads to different outputs

Can you show me the code you used to see the difference?

Adding bert - WIP

> * Do I need to quantize all layers? I saw that all layers are replaced with GEMM, but I only quantized a few of them. (see the code) The...

Batching with fuse_layers = True leads to different outputs

Yeah, I am not sure the new PR was a good fix. I have to assess this further before the next release. Have you been able to identify where the...