intel-extension-for-transformers [NeuralChat] Add Multi-Socket LLM Inference Example

[NeuralChat] Add Multi-Socket LLM Inference Example

Open letonghan opened this issue 1 year ago • 0 comments

Add NeuralChat example API not changed

Add Multi-Socket LLM inference example for NeuralChat. Related DeepSpeed PR: https://github.com/microsoft/DeepSpeed/pull/4750 (not merged yet)

Custormers are able to run LLM inference using multi-socket with DeepSpeed following this example.

Local tested on SPR server.

no.

Dec 25 '23 08:12 letonghan