Forkoz
Forkoz
The alpaca native can just be quantized the old way. I didn't see anyone delete any of the PT files from hugging face, have they been doing that? Is the...
I wish they made a 13b.. 7b just runs fast on everything.
I tested this out on alpaca native from the link above. Is it really supposed to be this slow? ``` Found models/alpaca-native-7b-4bit.pt Output generated in 25.91 seconds (0.46 tokens/s, 12...
There is now GPT-neoX, GPT-J, 4 bit loras and gpt-neo all with different kernels :( also GPT-J with offload (https://github.com/AlpinDale/gptq-gptj/commits/main)
It's for the old GPTQ.. but it does work. (in the old GPTQ)
Oh they will happily say your name and their own name too. But so far so good.
I have to hit stop twice.. I thought it was me but its happening on all models. Original problem for me appears to be gone. I'm not sure if it's...
change int threshold and it might work. do like 0.5-1.5 see hint:https://github.com/oobabooga/text-generation-webui/pull/198 it works for me on linux, should work the same on windows.. but I have 24g of ram
You will have to add it to where it loads the model inside of models.py and pray.
I do it like this: https://github.com/Ph0rk0z/text-generation-webui-testing/commit/ecad08f54c3282356888ee8f4dbf112cb331544a