KoboldAI-Client
KoboldAI-Client copied to clipboard
Feature request: 4-bit for OPT models
Existing OPT models such as Erebus can be quantized as 4bit using GPTQ-for-LLaMa, and these 4bit models can be loaded in the other text UI. In that way I was able to convert, load, and generate with Erebus 13B on a 6800XT, which otherwise can only fit half the model in 16-bit, and in 8-bit can fit it but not generate with it (due to CUDA vs ROCm compatibility). But that other UI has quite a few drawbacks compared to Kobold, such as the lack of World Info. Is there any possibility of Kobold adding the ability to use 4bit models like this?
https://github.com/0cc4m/KoboldAI/tree/4bit
Interesting! It doesn't seem to be working yet, but I'll have to keep an eye on it.
any updates?
We are getting close to finishing the backend overhaul for the KoboldAI United branch that unblocks implementations like this. So currently upstreaming for occam's work is still blocked.
Any idea on what happened to the 4bit branch. Is it ready to use?