Ascend support
Is your feature request related to a problem? Please describe
No
Describe the solution you'd like
As Ascend from Huawei is getting more and more attention in China market. Please support Ascend in xInference.
Describe alternatives you've considered
May be integrate FastChat as backend is a shortcut to implement this feature. FastChat announced it supports Ascend NPU. And according to our test, FastChat DOES support Ascend. FastChat also support exllamav2 as a plus for CUDA architecture.
Additional context
SOE in China was asked to adapt local AI hardware vendor such as Ascend, 寒武纪, etc.
Which model did you launch for FastChat on Ascend?
Baichuan2 and Qwen1.5. Looks like Qwen has concurrent issue on Ascend. Baichuan2 works fine.
Baichuan2 and Qwen1.5. Looks like Qwen has concurrent issue on Ascend. Baichuan2 works fine.
Do you mean Qwen1.5 has concurrent issue?
According to our test on fastchat with Ascend 310B, sometime the output will be messed up with concurrent input. Test on Qwen1.5-14B.
Ascend support is introduced in #1408, I tested baichuan-2 and qwen.