intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
[NeuralChat] Add Multi-Socket LLM Inference Example
Type of Change
Add NeuralChat example API not changed
Description
Add Multi-Socket LLM inference example for NeuralChat. Related DeepSpeed PR: https://github.com/microsoft/DeepSpeed/pull/4750 (not merged yet)
Expected Behavior & Potential Risk
Custormers are able to run LLM inference using multi-socket with DeepSpeed following this example.
How has this PR been tested?
Local tested on SPR server.
Dependency Change?
no.