FastChat Intel ARC/IPEX: Implement new LLM optimisations & Consolidate CPU & XPU IPEX optimisation branches

Intel ARC/IPEX: Implement new LLM optimisations & Consolidate CPU & XPU IPEX optimisation branches

Open Steve-Tech opened this issue 1 year ago • 0 comments

Why are these changes needed?

Intel Extension for PyTorch version 2.3.110+xpu & 2.4.0+cpu implement LLM specific optimisations, this PR makes use of those in FastChat. This seems to make inference slightly faster when using IPEX with a CPU or GPU.

While I was adding this, I also consolidated the IPEX specific cpu/xpu optimisation if statements into a single one.

Related issue number (if applicable)

N/A

Checks

[x] I've run format.sh to lint the changes in this PR.
[x] I've included any doc changes needed.
[x] I've made sure the relevant tests are passing (if applicable).

Sep 19 '24 11:09 Steve-Tech

FastChat FastChat copied to clipboard

Intel ARC/IPEX: Implement new LLM optimisations & Consolidate CPU & XPU IPEX optimisation branches

Why are these changes needed?

Related issue number (if applicable)

Checks

FastChat
FastChat copied to clipboard