Xd
Xd
正常的啊,你用的是ptuning-V2,这个就是额外附加的prefix-embedding,新增训练的就是这部分
多个模型实例同时跑,得参考fastchat或者torchserve使用多进程分别加载模型服务了吧 但为啥api.py这样不行,也不太懂,同求一个原理解释
+1 for this, why TEI request cuda12.2, while torch is still with cuda12.1, and there is no nv-driver compatible for both 12.1/12.2. https://docs.nvidia.com/deploy/cuda-compatibility/index.html
> > and there is no nv-driver compatible for both 12.1/12.2 > > [From the page you linked:](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) > > > If you are upgrading the driver to 525.60.13 which...