KoboldAI icon indicating copy to clipboard operation
KoboldAI copied to clipboard

Slow speed for some models.

Open BadisG opened this issue 2 years ago • 4 comments

Hey,

I tried using this fork and I realized that the speed was really slow for some models that I was using https://huggingface.co/reeducator/vicuna-13b-cocktail/tree/main

For vicuna-cocktail for example I get something like 2 tokens/s even though I easily reach 10 tokens/s on on ooba's webui. image

Some other models (like raw llama 13b) gives me 7 tokens/s which is fine

I guess this has to do with the vicuna-cocktail not having being saved with the "save_pretrained" option? I don't know just trying to guess there.

Anyway, if you could look at that and try to get "normal" speed with every situation that would be cool

Thanks in advance.

BadisG avatar May 21 '23 10:05 BadisG

When loading a model, it tells you the quantization version. Versions 0 and 2 are slow. 0 because it is old, 2 because upstream GPTQ prefers accuracy over speed. If you want fast models, use version 1. They usually show up on Hugginface as compatible with KoboldAI.

0cc4m avatar May 21 '23 11:05 0cc4m

When loading a model, it tells you the quantization version.

Oh yeah I have the Version 2

image

But still, even with those "slow" models I have I can get 10 tokens/s on ooba's webui, so it means there's a way to get the same speed on KoboldAI

BadisG avatar May 21 '23 12:05 BadisG

But still, even with those "slow" models I have I can get 10 tokens/s on ooba's webui, so it means there's a way to get the same speed on KoboldAI

If you can't achieve that, I have then 2 questions:

  1. How do you make a "Version 1 GPTQ" when you decided to quantize a model?
  2. Do you loose a lot of accuracy when using the version 1?

BadisG avatar May 21 '23 19:05 BadisG

I've tried a few models and am seeing the same. 2 tk/s with this version of kobald ai (same speed as standard) and 10-12 tk/s with oogabooga same models using exllama.

liquidsnakeblue avatar Aug 05 '23 05:08 liquidsnakeblue