inference BUG 多个请求qwen2-7b模型时，推理会报错 probability tensor contains either `inf`, `nan` or element

Describe the bug

A clear and concise description of what the bug is. 用dify配置xin进行推理，其中已经有一个任务正在持续单点调用qwen2模型。如果再进来一个请求（也就是两个请求一起处理时候）第二个请求看xin容器里日志就报错”probability tensor contains either inf, nan or element < 0“

xinference多个请求推理报错.txt 但是有时候两个模型同时提问也不报错，大概率会报错：

To Reproduce

To help us to reproduce this bug, please provide information below: 用的是容器部署的xinference，版本是 v0.12.0。显卡用的是 RTX A6000，目前看还剩下20多G的显存没用。启动 qwen 2的命令： xinference launch --model-engine transformers -n qwen2-instruct -s 7 -f pytorch --max_model_len 32000

Your Python version. Python 3.10.13
The version of xinference you use. v0.12.0
Versions of crucial packages.

Full stack of the error.
Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

Jun 26 '24 09:06 kolagood

这个应该是模型的问题。

Jun 30 '24 10:06 qinxuye

通过 dify_client 调用 API时，遇到同样的错误，然而在dify自己的聊天界面使用时就没有这个错误。

Jul 10 '24 09:07 veelion

我直接按教程最原始的配置跑也这样，单个对话可以，一旦多个对话，就会自动等待第一个对话完成后，才能进行第二个对话的回复。而用其他框架多个对话调用api时就会直接报错类似这种 RuntimeError: [address=0.0.0.0:42121, pid=481583] probability tensor contains either inf, nan or element < 0 完全满足不了并发要求啊，我测试用的qwen1.5-1.8b-chat，导致现在不敢用下去了