ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

support gguf_q4k_m / gguf_q4k_s

Open rnwang04 opened this issue 10 months ago • 0 comments

Description

1. Why the change?

https://github.com/analytics-zoo/nano/issues/1316#issuecomment-2076658639

2. User API changes

  model = AutoModelForCausalLM.from_pretrained(model_path,
                                               load_in_low_bit='gguf_q4k_m',
                                               optimize_model=True,
                                               torch_dtype=torch.float16,
                                               trust_remote_code=True,
                                               use_cache=True)
  model = AutoModelForCausalLM.from_pretrained(model_path,
                                               load_in_low_bit='gguf_q4k_s',
                                               optimize_model=True,
                                               torch_dtype=torch.float16,
                                               trust_remote_code=True,
                                               use_cache=True)

3. Summary of the change

  • add q5k precision
  • add gguf_q4k_m / gguf_q4k_s

4. How to test?

  • [x] Local test
  • [ ] Unit test
  • [x] performance validation: https://github.com/analytics-zoo/nano/issues/1316#issuecomment-2078580860

rnwang04 avatar Apr 25 '24 10:04 rnwang04