intel-extension-for-transformers icon indicating copy to clipboard operation
intel-extension-for-transformers copied to clipboard

[vLLM] optimizing vLLM models by qbits.

Open Zhenzhong1 opened this issue 1 year ago • 0 comments
trafficstars

Type of Change

New feature & API change

Description

  • [x] Complete the pipeline to replace part of vLLM linear modules by qbits linear. (chatglm2)
  • [ ] vLLM Integration API desgin: vllm_model = AutoModelForCausalLM.from_pretrained(args.model, use_vllm = True)
  • [ ] ITREX using pytorch==2.3.0 + cpu
  • [ ] extend acceleration to more models.

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Zhenzhong1 avatar May 15 '24 06:05 Zhenzhong1