intel-extension-for-transformers icon indicating copy to clipboard operation
intel-extension-for-transformers copied to clipboard

[NeuralChat] Add Multi-Socket LLM Inference Example

Open letonghan opened this issue 1 year ago • 0 comments

Type of Change

Add NeuralChat example API not changed

Description

Add Multi-Socket LLM inference example for NeuralChat. Related DeepSpeed PR: https://github.com/microsoft/DeepSpeed/pull/4750 (not merged yet)

Expected Behavior & Potential Risk

Custormers are able to run LLM inference using multi-socket with DeepSpeed following this example.

How has this PR been tested?

Local tested on SPR server.

Dependency Change?

no.

letonghan avatar Dec 25 '23 08:12 letonghan