[Question]: Why does the speed slow down after calling the rerank model?

Open fg2501 opened this issue 1 year ago • 2 comments

我用的Xinference部署的rerank模型没用这个模型之前，部署的话，全部都是加载到GPU上，但是用了这个模型之后，就有一部分会加载到CPU上，且，调用的时候，GPU也不出全力工作，远远低于所占用的空间。

Nov 12 '24 12:11 fg2501

I suggest you submit an issue to Xinference.

Nov 14 '24 01:11 KevinHuSh

我建议您向 Xinference 提交一个问题。

好的，我去问问

Nov 14 '24 10:11 fg2501