YangYangTx
Results
4
issues of
YangYangTx
这是因为 lmdeploy 采用了"激进"的 kv cache mem分配策略 https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html#usage 可以参考上面文档的说明 _Originally posted by @lvhan028 in https://github.com/InternLM/lmdeploy/issues/1626#issuecomment-2122040558_