FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Intel ARC/IPEX: Implement new LLM optimisations & Consolidate CPU & XPU IPEX optimisation branches

Open Steve-Tech opened this issue 1 year ago • 0 comments

Why are these changes needed?

Intel Extension for PyTorch version 2.3.110+xpu & 2.4.0+cpu implement LLM specific optimisations, this PR makes use of those in FastChat. This seems to make inference slightly faster when using IPEX with a CPU or GPU.

While I was adding this, I also consolidated the IPEX specific cpu/xpu optimisation if statements into a single one.

Related issue number (if applicable)

N/A

Checks

  • [x] I've run format.sh to lint the changes in this PR.
  • [x] I've included any doc changes needed.
  • [x] I've made sure the relevant tests are passing (if applicable).

Steve-Tech avatar Sep 19 '24 11:09 Steve-Tech