KoboldAI-Client icon indicating copy to clipboard operation
KoboldAI-Client copied to clipboard

Feature request: 4-bit for OPT models

Open anonymous721 opened this issue 1 year ago • 5 comments

Existing OPT models such as Erebus can be quantized as 4bit using GPTQ-for-LLaMa, and these 4bit models can be loaded in the other text UI. In that way I was able to convert, load, and generate with Erebus 13B on a 6800XT, which otherwise can only fit half the model in 16-bit, and in 8-bit can fit it but not generate with it (due to CUDA vs ROCm compatibility). But that other UI has quite a few drawbacks compared to Kobold, such as the lack of World Info. Is there any possibility of Kobold adding the ability to use 4bit models like this?

anonymous721 avatar Mar 22 '23 18:03 anonymous721

https://github.com/0cc4m/KoboldAI/tree/4bit

Skifoid avatar Mar 25 '23 14:03 Skifoid

Interesting! It doesn't seem to be working yet, but I'll have to keep an eye on it.

anonymous721 avatar Mar 26 '23 03:03 anonymous721

any updates?

Gitterman69 avatar Apr 03 '23 20:04 Gitterman69

We are getting close to finishing the backend overhaul for the KoboldAI United branch that unblocks implementations like this. So currently upstreaming for occam's work is still blocked.

henk717 avatar Apr 03 '23 20:04 henk717

Any idea on what happened to the 4bit branch. Is it ready to use?

bharadwajpro avatar Jun 01 '23 18:06 bharadwajpro