Casper comments

Results 293 comments of


                                            Casper

Suggest: Add Bayesian optimization support for ratio search

Hi @trotsky1997, this looks very interesting! Have you conducted any experiments to measure perplexity after using Bayesian optimization?

Suggest: Add Bayesian optimization support for ratio search

@trotsky1997 does this code include different alpha value for X and W? You observed better perplexity with it.

the question about the speed of AWQ && GPTQ

Which CPU is used? And can you post your full code including how you load the models? Also, it looks like you did not try out TinyChat which offers a...

the question about the speed of AWQ && GPTQ

Here is some feedback. 1. This part should not be a loop, just run `tokenizer.decode` on `generation_output` and use `token_num += len(generation_output)`. ```python for output in generation_output: tokenizer.decode(output, skip_special_tokens=True) token_num...

the question about the speed of AWQ && GPTQ

@abhinavkulkarni Does this integrate with the fused AWQ modules? For maximum speed, you can also use the AutoAWQ speed benchmark that uses these fused modules per default for all LLaMa...

Are there any prebuilt versions on pypi?

AutoAWQ is distributed on PyPi https://github.com/casper-hansen/AutoAWQ

INT4 quantization only delievers 20%~35% faster inference performance than FP16 for the LLaMA-13b on A100

@wanzhenchn AWQ provides a 2x speedup from my testing, sometimes even more than that. You should use TinyChat and not generate() because generate() is slow.

Nan or Infs when using llama-13B-chat

I have seen this error before, but I'm not quite sure why it happens. Happened to me with the 7B model and 13B LLaMa 2 models as per my memory....

Vicuna-1.5 Quantized Weights

> Hi Authors, > > Any plans to release Vicuna-1.5 quantized weights? Thanks Hi @mmaaz60, do you have access to a GPU? If so, I believe it should be easy...

Vicuna-1.5 Quantized Weights

> not working with FastChat. I see. This may be the fault of FastChat and not AWQ. Did you try TinyChat?