intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
No performance difference observed using HF Tranformers Vs intel-extension-for-transformers Python API/library
Hello, I am running PEFT+Quantized (BitsAndBytes) Falcon model.
I am trying to follow the instructions of using Python API .
While am able to load the model on my Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz dockerized server, I don't see any inference speed difference between using directly Huggingface transformers and Intel extension Python library.
Is the main purpose of this library to generate quantize and generate model binary to be run on Intel CPU or is it supposed to improve the speed of inference as well? If an inference speed is not focus for this project, is there any library or method that can help speed up the inference on Intel CPUs?
Looking forward the solutions if I am missing something or other possible approaches.
Thanks.
Hi Thanks for your usage. I found that your CPU type is Skylake, which have relative less ISA support, therefore, have less optimization we can do on this kind of CPU. Could you try ICX CPU or SPR CPU? we have some many advance performance optimizations both on VNNI/AMX ISA. You could get MHA/ FFN fusion, advance JIT kernel dispatch etc features to furthermore improve the performance.
Thanks
Hi Thanks for your usage. I found that your CPU type is Skylake, which have relative less ISA support, therefore, have less optimization we can do on this kind of CPU. Could you try ICX CPU or SPR CPU? we have some many advance performance optimizations both on VNNI/AMX ISA. You could get MHA/ FFN fusion, advance JIT kernel dispatch etc features to furthermore improve the performance.
Thanks
Does this support Intel 10700 ?
10700 has AVX2, LLM Runtime support it. It's not just a tool for generating quantied model, we provide fusion and kenrels for inference. You can check the performance data in https://medium.com/@NeuralCompressor/llm-performance-of-intel-extension-for-transformers-f7d061556176
we will check the status of Platinum 8167M, but we need time to find the machine. Can you share your model?
We improve performanc on client cpu, but still can't find a 8167M machine. Close this issue first, you can try the performance now.