DeepSeek-Coder
DeepSeek-Coder copied to clipboard

Published 20 hours ago •

Reame
Issues

请问如何用VLLM部署33B

Open laisun opened this issue 2 years ago • 6 comments

会报错啊，单机A100 ，torch 2.01， transformers 4.35 key = torch.repeat_interleave(key, self.num_queries_per_kv, dim=1) RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Nov 10 '23 09:11 laisun

单机 A100 是几张卡？打开 CUDA_LAUNCH_BLOCKING=1 试试呢，报错在哪里？

Nov 28 '23 12:11 soloice

我部署后输出是乱码，有人遇到过吗

Nov 30 '23 03:11 FrankWhh

单机 A100 是几张卡？打开 CUDA_LAUNCH_BLOCKING=1 试试呢，报错在哪里？

请问有vllm部署的教程吗？或者文件分享下文件

Dec 10 '23 11:12 txy6666yr

请问vllm部署时如何使用多卡加载模型，使用CUDA_VISIBLE_DEVICES=0,1还是只有一张卡load了，很奇怪，谢谢

Apr 12 '24 06:04 hyperbolic-c

请问vllm部署时如何使用多卡加载模型，使用CUDA_VISIBLE_DEVICES=0,1还是只有一张卡load了，很奇怪，谢谢

try add --tp=2 to launch argument

Apr 12 '24 06:04 mklf

请问vllm部署时如何使用多卡加载模型，使用CUDA_VISIBLE_DEVICES=0,1还是只有一张卡load了，很奇怪，谢谢

try add --tp=2 to launch argument

thanks, I have solved it by set --tensor-parallel-size >1.

Apr 17 '24 12:04 hyperbolic-c