ipex-llm
ipex-llm copied to clipboard
support gguf_q4k_m / gguf_q4k_s
Description
1. Why the change?
https://github.com/analytics-zoo/nano/issues/1316#issuecomment-2076658639
2. User API changes
model = AutoModelForCausalLM.from_pretrained(model_path,
load_in_low_bit='gguf_q4k_m',
optimize_model=True,
torch_dtype=torch.float16,
trust_remote_code=True,
use_cache=True)
model = AutoModelForCausalLM.from_pretrained(model_path,
load_in_low_bit='gguf_q4k_s',
optimize_model=True,
torch_dtype=torch.float16,
trust_remote_code=True,
use_cache=True)
3. Summary of the change
- add q5k precision
- add gguf_q4k_m / gguf_q4k_s
4. How to test?
- [x] Local test
- [ ] Unit test
- [x] performance validation: https://github.com/analytics-zoo/nano/issues/1316#issuecomment-2078580860