inference
inference copied to clipboard
embedding显存增加导致模型掉线
System Info / 系統信息
xinference v0.15.1(实际上从0.14.0开始一直存在)显卡是A40
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [X] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装
Version info / 版本信息
0.15.1
The command used to start Xinference / 用以启动 xinference 的命令
docker run
Reproduction / 复现过程
- 上线embedding模型(我使用的是bge-m3,chunk_size是1k)
- 添加到向量数据库(调用api)
- cuda out of memory
Expected behavior / 期待表现
embedding显存释放,稳定占用显存
bge-m3和bge-reranker-v2-m3共用一张显卡,最多时显存会飙升到55G(本身显卡只有46G显存),平常bge-m3占用不到3G
也遇到了这个问题,有没有解决办法?
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.
是不是 embedding 模型会莫名掉线,但是 LLM 还在?