ipex-llm support gguf_q4k_m / gguf_q4k

support gguf_q4k_m / gguf_q4k_s

Open rnwang04 opened this issue 10 months ago • 0 comments

Description

1. Why the change?

https://github.com/analytics-zoo/nano/issues/1316#issuecomment-2076658639

2. User API changes

  model = AutoModelForCausalLM.from_pretrained(model_path,
                                               load_in_low_bit='gguf_q4k_m',
                                               optimize_model=True,
                                               torch_dtype=torch.float16,
                                               trust_remote_code=True,
                                               use_cache=True)

  model = AutoModelForCausalLM.from_pretrained(model_path,
                                               load_in_low_bit='gguf_q4k_s',
                                               optimize_model=True,
                                               torch_dtype=torch.float16,
                                               trust_remote_code=True,
                                               use_cache=True)

3. Summary of the change

add q5k precision
add gguf_q4k_m / gguf_q4k_s

4. How to test?

[x] Local test
[ ] Unit test
[x] performance validation: https://github.com/analytics-zoo/nano/issues/1316#issuecomment-2078580860

Apr 25 '24 10:04 rnwang04

ipex-llm ipex-llm copied to clipboard

support gguf_q4k_m / gguf_q4k_s

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

ipex-llm
ipex-llm copied to clipboard