Gaaaaam
Gaaaaam
> Solved the same problems by specifying the version of mmcv during installation. > > 1. uninstall mmcv with code below in your virtual env for mmdetection: > `mim uninstall...
> Similar to [open-mmlab/mmdetection#11668](https://github.com/open-mmlab/mmdetection/issues/11668) I followed the suggestion in [#11668](https://github.com/open-mmlab/mmdetection/issues/11668). It works now, thank you!!!
用类似cli_demo的方式部署的,是分几次用大约7-8k的上下文带着对应的问题去问的模型,第一、二次还能输出,到后面显存占满之后,再输入长上下文就不输出了。因为是通过fastapi的方式做成接口提供推理能力,发现同时做一个接口,使用torch.cuda.empty_cache()清理一下显存就可以继续推理了。 @jklj077 > 怎么部署的呢?web_demo/cli_demo/openai_api/fastchat+vllm?确认下对话历史里别把好几轮的7k-8k的都放进去了。