intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
[vLLM] optimizing vLLM models by qbits.
trafficstars
Type of Change
New feature & API change
Description
- [x] Complete the pipeline to replace part of vLLM linear modules by qbits linear. (chatglm2)
- [ ] vLLM Integration API desgin: vllm_model = AutoModelForCausalLM.from_pretrained(args.model, use_vllm = True)
- [ ] ITREX using pytorch==2.3.0 + cpu
- [ ] extend acceleration to more models.
Expected Behavior & Potential Risk
the expected behavior that triggered by this PR
How has this PR been tested?
how to reproduce the test (including hardware information)
Dependency Change?
any library dependency introduced or removed