FastChat
FastChat copied to clipboard
Intel ARC/IPEX: Implement new LLM optimisations & Consolidate CPU & XPU IPEX optimisation branches
Why are these changes needed?
Intel Extension for PyTorch version 2.3.110+xpu & 2.4.0+cpu implement LLM specific optimisations, this PR makes use of those in FastChat. This seems to make inference slightly faster when using IPEX with a CPU or GPU.
While I was adding this, I also consolidated the IPEX specific cpu/xpu optimisation if statements into a single one.
Related issue number (if applicable)
N/A
Checks
- [x] I've run
format.shto lint the changes in this PR. - [x] I've included any doc changes needed.
- [x] I've made sure the relevant tests are passing (if applicable).