KoboldAI-Client Feature request: 4-bit for OPT models

Feature request: 4-bit for OPT models

Open anonymous721 opened this issue 1 year ago • 5 comments

Existing OPT models such as Erebus can be quantized as 4bit using GPTQ-for-LLaMa, and these 4bit models can be loaded in the other text UI. In that way I was able to convert, load, and generate with Erebus 13B on a 6800XT, which otherwise can only fit half the model in 16-bit, and in 8-bit can fit it but not generate with it (due to CUDA vs ROCm compatibility). But that other UI has quite a few drawbacks compared to Kobold, such as the lack of World Info. Is there any possibility of Kobold adding the ability to use 4bit models like this?

Mar 22 '23 18:03 anonymous721

https://github.com/0cc4m/KoboldAI/tree/4bit

Mar 25 '23 14:03 Skifoid

Interesting! It doesn't seem to be working yet, but I'll have to keep an eye on it.

Mar 26 '23 03:03 anonymous721

any updates?

Apr 03 '23 20:04 Gitterman69

We are getting close to finishing the backend overhaul for the KoboldAI United branch that unblocks implementations like this. So currently upstreaming for occam's work is still blocked.

Apr 03 '23 20:04 henk717

Any idea on what happened to the 4bit branch. Is it ready to use?

Jun 01 '23 18:06 bharadwajpro

KoboldAI-Client KoboldAI-Client copied to clipboard

Feature request: 4-bit for OPT models

KoboldAI-Client
KoboldAI-Client copied to clipboard