unsloth Adding support 8bit quantization.

Adding support 8bit quantization.

Open Darren80 opened this issue 1 year ago • 3 comments

Adding some support for 8bit quantization would be a good idea because it can fill the gap for people with less GPU VRAM to work with, So I think it would be a good idea if possible. thank you.

Jan 26 '24 15:01 Darren80

Great idea! If this gets more upvotes as usual, it'll signal I definitely have to add it to my roadmap :)) Since we're still just 2 brothers, I'll see what I can do if I have bandwidth :)

Jan 26 '24 18:01 danielhanchen

No worries mate.

Jan 28 '24 15:01 Darren80

@danielhanchen When I set 8bit=True in my code,

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name, # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    load_in_8bit=True,
    load_in_4bit=False,

I encountered the following error:

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != signed char

if this is because 8bit is not supported I would kindly request you to consider adding 8-bit quantization

Mar 29 '24 11:03 JhonDan1999

unsloth unsloth copied to clipboard

Adding support 8bit quantization.

unsloth
unsloth copied to clipboard