inference icon indicating copy to clipboard operation
inference copied to clipboard

Ascend support

Open Tint0ri opened this issue 1 year ago • 5 comments

Is your feature request related to a problem? Please describe

No

Describe the solution you'd like

As Ascend from Huawei is getting more and more attention in China market. Please support Ascend in xInference.

Describe alternatives you've considered

May be integrate FastChat as backend is a shortcut to implement this feature. FastChat announced it supports Ascend NPU. And according to our test, FastChat DOES support Ascend. FastChat also support exllamav2 as a plus for CUDA architecture.

Additional context

SOE in China was asked to adapt local AI hardware vendor such as Ascend, 寒武纪, etc.

Tint0ri avatar Apr 18 '24 05:04 Tint0ri

Which model did you launch for FastChat on Ascend?

qinxuye avatar Apr 18 '24 09:04 qinxuye

Baichuan2 and Qwen1.5. Looks like Qwen has concurrent issue on Ascend. Baichuan2 works fine.

Tint0ri avatar Apr 18 '24 12:04 Tint0ri

Baichuan2 and Qwen1.5. Looks like Qwen has concurrent issue on Ascend. Baichuan2 works fine.

Do you mean Qwen1.5 has concurrent issue?

qinxuye avatar Apr 23 '24 07:04 qinxuye

According to our test on fastchat with Ascend 310B, sometime the output will be messed up with concurrent input. Test on Qwen1.5-14B.

Tint0ri avatar Apr 27 '24 14:04 Tint0ri

Ascend support is introduced in #1408, I tested baichuan-2 and qwen.

qinxuye avatar Apr 29 '24 07:04 qinxuye