DeepSeek-Coder
DeepSeek-Coder copied to clipboard
请问如何用VLLM部署33B
会报错啊,单机A100 ,torch 2.01, transformers 4.35
key = torch.repeat_interleave(key, self.num_queries_per_kv, dim=1)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
单机 A100 是几张卡?打开 CUDA_LAUNCH_BLOCKING=1 试试呢,报错在哪里?
我部署后输出是乱码,有人遇到过吗
单机 A100 是几张卡?打开 CUDA_LAUNCH_BLOCKING=1 试试呢,报错在哪里?
请问有vllm部署的教程吗?或者文件分享下文件
请问vllm部署时如何使用多卡加载模型,使用CUDA_VISIBLE_DEVICES=0,1还是只有一张卡load了, 很奇怪,谢谢
请问vllm部署时如何使用多卡加载模型,使用
CUDA_VISIBLE_DEVICES=0,1还是只有一张卡load了, 很奇怪,谢谢
try add --tp=2 to launch argument
请问vllm部署时如何使用多卡加载模型,使用
CUDA_VISIBLE_DEVICES=0,1还是只有一张卡load了, 很奇怪,谢谢try add
--tp=2to launch argument
thanks, I have solved it by set --tensor-parallel-size >1.