[Bug] RuntimeError: CUDA error: an illegal memory access was encountered
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
RuntimeError: CUDA error: an illegal memory access was encountered 我发现这个错误在多人调用接口的时候,容易发生,接口调用代码见【reproduction】 运行环境(nvidia-smi显示的显卡驱动 550.144.03; cuda驱动 CUDA Version: 12.4):
Package Version Editable project location
--------------------------------- ------------- -------------------------
accelerate 1.4.0
addict 2.4.0
aiohappyeyeballs 2.4.6
aiohttp 3.11.12
aiohttp-cors 0.7.0
aiosignal 1.3.2
airportsdata 20241001
annotated-types 0.7.0
anthropic 0.46.0
anyio 4.8.0
astor 0.8.1
asttokens 3.0.0
async-timeout 5.0.1
attrs 25.1.0
baidu-aip 4.16.13
bcrypt 4.3.0
beautifulsoup4 4.13.3
bitsandbytes 0.45.3
blake3 1.0.4
blinker 1.9.0
Brotli 1.1.0
cachetools 5.5.2
certifi 2025.1.31
cffi 1.17.1
cfgv 3.4.0
chardet 5.2.0
charset-normalizer 3.4.1
click 8.1.8
cloudpickle 3.1.1
colorful 0.5.6
colossalai 0.4.9
compressed-tensors 0.9.1
contexttimer 0.3.3
contourpy 1.3.1
cryptography 44.0.2
cuda-bindings 12.8.0
cuda-python 12.8.0
cycler 0.12.1
datasets 3.3.2
decorator 5.1.1
decord 0.6.0
deepspeed 0.15.4
Deprecated 1.2.18
depyf 0.18.0
diffusers 0.29.0
dill 0.3.8
diskcache 5.6.3
distlib 0.3.9
distro 1.9.0
docstring_parser 0.16
einops 0.8.1
exceptiongroup 1.2.2
executing 2.2.0
fabric 3.2.2
fastapi 0.115.8
filelock 3.17.0
fire 0.7.0
flashinfer-python 0.2.1.post2
Flask 3.1.0
Flask-Cors 5.0.0
fonttools 4.56.0
frozenlist 1.5.0
fsspec 2024.12.0
galore-torch 1.0
gevent 24.11.1
gguf 0.10.0
gmpy2 2.1.5
google 3.0.0
google-api-core 2.24.1
google-auth 2.38.0
googleapis-common-protos 1.68.0
greenlet 3.1.1
grpcio 1.70.0
h11 0.14.0
h2 4.2.0
hf_transfer 0.1.9
hjson 3.1.0
hpack 4.1.0
httpcore 1.0.7
httptools 0.6.4
httpx 0.28.1
huggingface-hub 0.29.1
hyperframe 6.1.0
identify 2.6.9
idna 3.10
importlib_metadata 8.6.1
iniconfig 2.0.0
interegular 0.3.3
invoke 2.2.0
ipdb 0.13.13
ipython 8.32.0
itsdangerous 2.2.0
jedi 0.19.2
Jinja2 3.1.5
jiter 0.8.2
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
kiwisolver 1.4.8
lark 1.2.2
litellm 1.61.13
llvmlite 0.44.0
lm-format-enforcer 0.10.10
lmdeploy 0.6.5 /disk2/elivate/lmdeploy
loguru 0.7.3
markdown-it-py 3.0.0
MarkupSafe 3.0.2
matplotlib 3.10.0
matplotlib-inline 0.1.7
mdurl 0.1.2
mistral_common 1.5.3
mmengine-lite 0.10.6
modelscope 1.23.1
mpmath 1.3.0
msgpack 1.1.0
msgspec 0.19.0
multidict 6.1.0
multiprocess 0.70.16
nest-asyncio 1.6.0
networkx 3.4.2
ninja 1.11.1.3
nodeenv 1.9.1
numba 0.61.0
numpy 1.26.4
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-ml-py 12.570.86
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
openai 1.63.2
opencensus 0.11.4
opencensus-context 0.1.3
opencv-python-headless 4.11.0.86
orjson 3.10.15
outlines 0.1.11
outlines_core 0.1.26
packaging 24.2
pandas 2.2.3
paramiko 3.5.1
parso 0.8.4
partial-json-parser 0.2.1.1.post5
peft 0.11.1
pexpect 4.9.0
pillow 11.1.0
pip 25.0.1
platformdirs 4.3.6
pluggy 1.5.0
plumbum 1.9.0
pre_commit 4.1.0
prometheus_client 0.21.1
prometheus-fastapi-instrumentator 7.0.2
prompt_toolkit 3.0.50
propcache 0.3.0
proto-plus 1.26.0
protobuf 5.29.3
psutil 7.0.0
ptyprocess 0.7.0
pure_eval 0.2.3
py-cpuinfo 9.0.0
py-spy 0.4.0
pyairports 2.1.1
pyarrow 19.0.1
pyasn1 0.6.1
pyasn1_modules 0.4.1
pybind11 2.13.6
pycountry 24.6.1
pycparser 2.22
pydantic 2.10.6
pydantic_core 2.27.2
Pygments 2.19.1
PyNaCl 1.5.0
pynvml 12.0.0
pyparsing 3.2.1
PySocks 1.7.1
pytest 8.3.4
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.20
pytz 2025.1
PyYAML 6.0.2
pyzmq 26.2.1
ray 2.42.1
referencing 0.36.2
regex 2024.11.6
requests 2.32.3
rich 13.9.4
rpds-py 0.23.1
rpyc 6.0.0
rsa 4.9
safetensors 0.4.5
sentencepiece 0.2.0
setproctitle 1.3.4
setuptools 75.8.0
sgl-kernel 0.0.3.post6
shortuuid 1.0.13
shtab 1.7.1
six 1.17.0
smart-open 7.1.0
sniffio 1.3.1
soupsieve 2.6
stack-data 0.6.3
starlette 0.45.3
sympy 1.13.1
termcolor 2.5.0
tiktoken 0.9.0
tokenizers 0.20.3
tomli 2.2.1
torch 2.5.1
torchao 0.8.0
torchaudio 2.5.1
torchvision 0.20.1
tqdm 4.67.1
traitlets 5.14.3
transformers 4.46.3
triton 3.0.0
trl 0.8.6
typeguard 4.4.2
typing_extensions 4.12.2
tyro 0.9.16
tzdata 2025.1
urllib3 2.3.0
uvicorn 0.29.0
uvloop 0.21.0
virtualenv 20.29.2
vllm 0.7.2
watchfiles 1.0.4
wcwidth 0.2.13
websockets 15.0
Werkzeug 3.1.3
wheel 0.45.1
wrapt 1.17.2
xformers 0.0.28.post3
xgrammar 0.1.10
xxhash 3.5.0
yapf 0.43.0
yarl 1.18.3
zipp 3.21.0
zope.event 5.0
zope.interface 7.2
zstandard 0.19.0
Reproduction
运行命令:
lmdeploy serve api_server /disk2/elivate/DeepSeek/DeepSeek-R1 --tp 8 --backend pytorch --chat-template deepseek \
--cache-max-entry-count 0.5 --server-name 0.0.0.0 --server-port 23333
接口调用代码:
#coding:utf-8
from openai import OpenAI
import httpx
#运行代码和服务器是一个ip时,需要讲0.0.0.0改成服务端ip(端口号23333可能也需要改成指定的端口)
client = OpenAI(
api_key='YOUR_API_KEY',
base_url="http://0.0.0.0:23333/v1",
http_client=httpx.Client(verify=False)
)
model_name = client.models.list().data[0].id
input_format = '''
任何输出都要有思考过程,输出内容必须以 "<think>\n\n嗯" 开头。仔细揣摩用户意图,之后提供逻辑清晰且内容完整的回答,可以使用Markdown格式优化信息呈现。\n\n
{}'''
#是否深度思考(prompt强制输出思考内容)
use_think=True
input_str = '输入'
if use_think:
cur_message = [{"role": "user", "content": input_format.format(input_str)}]
else:
cur_message = [{"role": "user", "content": input_str}]
# #流式输出(视觉效果好)
# response = client.chat.completions.create(
# model=model_name,
# messages=cur_message,
# temperature=0.8,
# top_p=0.8,
# stream=True,
# )
# text=''
# for chunk in response:
# # 判断回复是否非空
# if chunk.choices[0].delta.content:
# text+=chunk.choices[0].delta.content
# print(chunk.choices[0].delta.content, end='') # 设置 end='' 实现不换行,视觉上拼接输出
# #打印
# print('\n',text)
#非流式输出
response = client.chat.completions.create(
model=model_name,
messages=cur_message,
temperature=0.6,
top_p=0.8,
)
#打印
print(response.choices[0].message.content)
Environment
/disk2/eliviate/lmdeploy/lmdeploy/cli/entrypoint.py
sys.platform: linux
Python: 3.10.16 | packaged by conda-forge | (main, Dec 5 2024, 14:16:10) [GCC 13.3.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA H200
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
GCC: gcc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
PyTorch: 2.5.1+cu124
PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 12.4
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 90.1
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
TorchVision: 0.20.1+cu124
LMDeploy: 0.6.5+
transformers: 4.46.3
gradio: Not Found
fastapi: 0.115.8
pydantic: 2.10.6
triton: 3.0.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE PIX PIX NODE NODE SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE PIX PIX NODE NODE SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 NODE NODE NODE NODE PIX PIX SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 NODE NODE NODE NODE PIX PIX SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS PIX PIX NODE NODE 48-95,144-191 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS SYS SYS SYS SYS PIX PIX NODE NODE 48-95,144-191 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS SYS SYS SYS SYS NODE NODE PIX PIX 48-95,144-191 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS SYS SYS SYS SYS NODE NODE PIX PIX 48-95,144-191 1 N/A
NIC0 NODE NODE NODE NODE SYS SYS SYS SYS X PIX NODE NODE NODE NODE SYS SYS SYS SYS
NIC1 NODE NODE NODE NODE SYS SYS SYS SYS PIX X NODE NODE NODE NODE SYS SYS SYS SYS
NIC2 PIX PIX NODE NODE SYS SYS SYS SYS NODE NODE X PIX NODE NODE SYS SYS SYS SYS
NIC3 PIX PIX NODE NODE SYS SYS SYS SYS NODE NODE PIX X NODE NODE SYS SYS SYS SYS
NIC4 NODE NODE PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE X PIX SYS SYS SYS SYS
NIC5 NODE NODE PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE PIX X SYS SYS SYS SYS
NIC6 SYS SYS SYS SYS PIX PIX NODE NODE SYS SYS SYS SYS SYS SYS X PIX NODE NODE
NIC7 SYS SYS SYS SYS PIX PIX NODE NODE SYS SYS SYS SYS SYS SYS PIX X NODE NODE
NIC8 SYS SYS SYS SYS NODE NODE PIX PIX SYS SYS SYS SYS SYS SYS NODE NODE X PIX
NIC9 SYS SYS SYS SYS NODE NODE PIX PIX SYS SYS SYS SYS SYS SYS NODE NODE PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
NIC8: mlx5_8
NIC9: mlx5_9
Error traceback
错误输出:
Traceback (most recent call last):
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
File "/disk2/eliviate/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 482, in _start_tp_process
func(rank, *args, **kwargs)
File "/disk2/eliviate/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 439, in _tp_model_loop
model_forward(
File "/disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/disk2/eliviate/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 156, in model_forward
output = model(**input_dict)
File "/disk2/eliviate/lmdeploy/lmdeploy/pytorch/backends/cuda/graph_runner.py", line 149, in __call__
return self.model(**kwargs)
File "/disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/disk2/eliviate/lmdeploy/lmdeploy/pytorch/models/deepseek_v2.py", line 702, in forward
hidden_states = self.model(
File "/disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/disk2/eliviate/lmdeploy/lmdeploy/pytorch/models/deepseek_v2.py", line 654, in forward
hidden_states, residual = decoder_layer(
File "/disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/disk2/eliviate/lmdeploy/lmdeploy/pytorch/models/deepseek_v2.py", line 555, in forward
hidden_states = self.self_attn(
File "/disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/disk2/eliviate/lmdeploy/lmdeploy/pytorch/models/deepseek_v2.py", line 256, in forward
query_states[..., nope_size:] = q_pe
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[rank6]:[E314 14:33:46.584848437 ProcessGroupNCCL.cpp:1595] [PG ID 0 PG GUID 0(default_pg) Rank 6] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f0f0116c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f0f011166e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f0f01534a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f0eb7025726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f0eb702a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f0eb7031b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f0eb703361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7f0f01c4a5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7f0f02370ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7f0f02402850 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
[rank4]:[E314 14:33:46.585238329 ProcessGroupNCCL.cpp:1595] [PG ID 0 PG GUID 0(default_pg) Rank 4] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f5e2fb6c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f5e2fb166e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5e2ffd4a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f5de5a25726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f5de5a2a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f5de5a31b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f5de5a3361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7f5e306ea5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7f5e30e10ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7f5e30ea2850 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 0 PG GUID 0(default_pg) Rank 6] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f0f0116c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f0f011166e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f0f01534a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f0eb7025726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f0eb702a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f0eb7031b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f0eb703361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7f0f01c4a5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7f0f02370ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7f0f02402850 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1601 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f0f0116c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xe4271b (0x7f0eb6ca071b in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x145c0 (0x7f0f01c4a5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x94ac3 (0x7f0f02370ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: <unknown function> + 0x126850 (0x7f0f02402850 in /lib/x86_64-linux-gnu/libc.so.6)
what(): [PG ID 0 PG GUID 0(default_pg) Rank 4] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f5e2fb6c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f5e2fb166e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5e2ffd4a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f5de5a25726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f5de5a2a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f5de5a31b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f5de5a3361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7f5e306ea5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7f5e30e10ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7f5e30ea2850 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1601 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f5e2fb6c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xe4271b (0x7f5de56a071b in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x145c0 (0x7f5e306ea5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x94ac3 (0x7f5e30e10ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: <unknown function> + 0x126850 (0x7f5e30ea2850 in /lib/x86_64-linux-gnu/libc.so.6)
[rank2]:[E314 14:33:46.588302933 ProcessGroupNCCL.cpp:1595] [PG ID 0 PG GUID 0(default_pg) Rank 2] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f4c560b9446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f4c560636e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f4c561a5a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f4c0c025726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f4c0c02a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f4c0c031b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f4c0c03361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7f4c56b785c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7f4c5729eac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7f4c57330850 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
[rank5]:[E314 14:33:46.588801693 ProcessGroupNCCL.cpp:1595] [PG ID 0 PG GUID 0(default_pg) Rank 5] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f581a0b9446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f581a0636e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f581a1a5a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f57d0025726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f57d002a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f57d0031b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f57d003361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7f581abcf5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7f581b2f5ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7f581b387850 in /lib/x86_64-linux-gnu/libc.so.6)
[rank3]:[E314 14:33:46.588877826 ProcessGroupNCCL.cpp:1595] [PG ID 0 PG GUID 0(default_pg) Rank 3] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7ff69c16c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7ff69c1166e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7ff69c544a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7ff652025726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7ff65202a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7ff652031b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7ff65203361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7ff69cc5a5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7ff69d380ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7ff69d412850 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
terminate called after throwing an instance of 'c10::DistBackendError'
[rank1]:[E314 14:33:46.589305831 ProcessGroupNCCL.cpp:1595] [PG ID 0 PG GUID 0(default_pg) Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fd56276c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fd5627166e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fd562b73a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fd518625726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7fd51862a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7fd518631b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fd51863361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7fd5632895c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7fd5639afac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7fd563a41850 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
[rank7]:[E314 14:33:46.589658386 ProcessGroupNCCL.cpp:1595] [PG ID 0 PG GUID 0(default_pg) Rank 7] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fa19536c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fa1953166e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fa19578ca18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fa14b225726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7fa14b22a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7fa14b231b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fa14b23361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7fa195ea25c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7fa1965c8ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7fa19665a850 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 0 PG GUID 0(default_pg) Rank 2] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f4c560b9446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f4c560636e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f4c561a5a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f4c0c025726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f4c0c02a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f4c0c031b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f4c0c03361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7f4c56b785c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7f4c5729eac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7f4c57330850 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1601 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f4c560b9446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xe4271b (0x7f4c0bca071b in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x145c0 (0x7f4c56b785c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x94ac3 (0x7f4c5729eac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: <unknown function> + 0x126850 (0x7f4c57330850 in /lib/x86_64-linux-gnu/libc.so.6)
what(): [PG ID 0 PG GUID 0(default_pg) Rank 5] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f581a0b9446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f581a0636e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f581a1a5a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f57d0025726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f57d002a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f57d0031b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f57d003361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7f581abcf5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7f581b2f5ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7f581b387850 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1601 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f581a0b9446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xe4271b (0x7f57cfca071b in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x145c0 (0x7f581abcf5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x94ac3 (0x7f581b2f5ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: <unknown function> + 0x126850 (0x7f581b387850 in /lib/x86_64-linux-gnu/libc.so.6)
what():
[PG ID 0 PG GUID 0(default_pg) Rank 3] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7ff69c16c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7ff69c1166e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7ff69c544a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7ff652025726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7ff65202a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7ff652031b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7ff65203361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7ff69cc5a5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7ff69d380ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7ff69d412850 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1601 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7ff69c16c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xe4271b (0x7ff651ca071b in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x145c0 (0x7ff69cc5a5c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x94ac3 (0x7ff69d380ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: <unknown function> + 0x126850 (0x7ff69d412850 in /lib/x86_64-linux-gnu/libc.so.6)
what(): [PG ID 0 PG GUID 0(default_pg) Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fd56276c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fd5627166e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fd562b73a18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fd518625726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7fd51862a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7fd518631b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fd51863361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7fd5632895c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7fd5639afac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7fd563a41850 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1601 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fd56276c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xe4271b (0x7fd5182a071b in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x145c0 (0x7fd5632895c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x94ac3 (0x7fd5639afac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: <unknown function> + 0x126850 (0x7fd563a41850 in /lib/x86_64-linux-gnu/libc.so.6)
what():
[PG ID 0 PG GUID 0(default_pg) Rank 7] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fa19536c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fa1953166e4 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fa19578ca18 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fa14b225726 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7fa14b22a3f0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7fa14b231b5a in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7fa14b23361d in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x7fa195ea25c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x7fa1965c8ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x7fa19665a850 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1601 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fa19536c446 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xe4271b (0x7fa14aea071b in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x145c0 (0x7fa195ea25c0 in /disk2/condaenvs/deepseek/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x94ac3 (0x7fa1965c8ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: <unknown function> + 0x126850 (0x7fa19665a850 in /lib/x86_64-linux-gnu/libc.so.6)
/disk2/condaenvs/deepseek/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
可以试试看我们最新的 main branch,有修复过一个 moe kernel 的边界条件检查
@github-eliviate @grimoire 这个问题解决了吗?我今天安装了最新的lmdeply==0.7.2,调用QwenVL2-5_72B,遇到了同样的问题。
@zhyxun 麻烦提供一下复现方式还有环境信息
@grimoire
环境信息
NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.4
python版本是Python 3.10.12
Package Version
accelerate 0.33.0 addict 2.4.0 aiofiles 23.2.1 aiohappyeyeballs 2.4.0 aiohttp 3.10.5 aiosignal 1.3.1 airportsdata 20250224 annotated-types 0.7.0 anyio 4.4.0 argcomplete 3.6.0 async-timeout 4.0.3 attrs 24.2.0 av 14.2.0 certifi 2024.7.4 cfgv 3.4.0 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.1.1 cmake 3.30.2 contourpy 1.2.1 cycler 0.12.1 datasets 2.21.0 decord 0.6.0 dill 0.3.8 diskcache 5.6.3 distlib 0.3.9 distro 1.9.0 einops 0.8.0 exceptiongroup 1.2.2 fastapi 0.112.2 ffmpy 0.4.0 filelock 3.13.1 fire 0.6.0 flash-attn 2.6.3 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.2.0 genson 1.3.0 gradio 4.42.0 gradio_client 1.3.0 grpcio 1.66.0 h11 0.14.0 httpcore 1.0.5 httpx 0.27.0 huggingface-hub 0.29.3 identify 2.6.9 idna 3.8 importlib_metadata 8.4.0 importlib_resources 6.4.4 interegular 0.3.3 iso3166 2.1.1 Jinja2 3.1.3 jiter 0.5.0 jsonschema 4.23.0 jsonschema-specifications 2024.10.1 kiwisolver 1.4.5 lark 1.2.2 lmdeploy 0.7.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.2 mdurl 0.1.2 mmengine-lite 0.10.4 mpmath 1.3.0 msgpack 1.1.0 multidict 6.0.5 multiprocess 0.70.16 nest-asyncio 1.6.0 networkx 3.2.1 nodeenv 1.9.1 numpy 1.26.3 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127 openai 1.42.0 orjson 3.10.7 outlines 0.2.1 outlines_core 0.1.26 packaging 24.1 pandas 2.2.2 partial-json-parser 0.2.1.1.post5 peft 0.11.1 pillow 10.2.0 pip 25.0.1 pipx 1.7.1 platformdirs 4.2.2 pre_commit 4.2.0 protobuf 4.25.4 psutil 6.0.0 pyarrow 17.0.0 pybind11 2.13.1 pydantic 2.8.2 pydantic_core 2.20.1 pydub 0.25.1 Pygments 2.18.0 pynvml 11.5.3 pyparsing 3.1.4 python-dateutil 2.9.0.post0 python-multipart 0.0.9 python-rapidjson 1.20 pytz 2024.1 PyYAML 6.0.2 qwen-vl-utils 0.0.8 ray 2.43.0 referencing 0.36.2 regex 2024.7.24 requests 2.32.3 rich 13.7.1 rpds-py 0.23.1 ruff 0.6.2 safetensors 0.4.4 semantic-version 2.10.0 sentencepiece 0.2.0 setuptools 69.5.1 shellingham 1.5.4 shortuuid 1.0.13 six 1.16.0 sniffio 1.3.1 starlette 0.38.2 sympy 1.13.1 termcolor 2.4.0 tiktoken 0.7.0 timm 1.0.9 tokenizers 0.21.1 tomli 2.0.1 tomlkit 0.12.0 torch 2.5.1 torchvision 0.20.1 tqdm 4.66.5 transformers 4.49.0 transformers-stream-generator 0.0.5 triton 3.1.0 tritonclient 2.48.0 typer 0.12.5 typing_extensions 4.12.2 tzdata 2024.1 urllib3 2.2.2 userpath 1.9.2 uvicorn 0.30.6 virtualenv 20.29.3 websockets 12.0 wheel 0.44.0 xxhash 3.5.0 yapf 0.40.2 yarl 1.9.4 zipp 3.20.0
复现
代码:
from torch.utils.data import Dataset
from openai import OpenAI
import base64
import json
from torch.utils.data.dataloader import DataLoader
def caption_collate_fn(file_meta_batch):
return file_meta_batch
class QA_dataset(Dataset):
def __init__(
self,
base_url: str = None,
) -> None:
self.image_paths = ["xxx.jpg"] * 10000 ## 可以设置为某一张图片
self.prompt = "请描述这张图片"
self.client = OpenAI(api_key='AABBCCDD', base_url=base_url)
self.server_model_name = self.client.models.list().data[0].id
print("LLM/VLM Model:", self.server_model_name)
def load_bytes_from_image(self, image_path):
with open(image_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
return encoded_string.decode("utf-8") # 将bytes转换为字符串
def __len__(self) -> int:
"""Get length of current self.files."""
return len(self.image_paths)
def __getitem__(self, index):
"""Get item."""
image_path = self.image_paths[index]
try:
base64_data = self.load_bytes_from_image(image_path)
except:
return None
try:
response = self.client.chat.completions.create(
model=self.server_model_name,
messages=[{
'role': 'user',
'content': [
{
'type': 'text',
'text': self.prompt,
},
{
'type': 'image_url',
'image_url': {
'url':
f"data:image/jpeg;base64,{base64_data}",
},
}
],
}],
temperature=0.8,
top_p=0.95,
)
# paser the response
response_txt = response.choices[0].message.content
print(response_txt)
message = {
"conversations": response_txt,
"image_path": image_path,
}
except:
return None
return message
def main():
## load dataset
dataset = QA_dataset(
base_url="http://172.24.208.140:23333/v1",
)
dataloader = DataLoader(
dataset,
batch_size=64,
shuffle=False,
num_workers=64,
collate_fn=caption_collate_fn,
prefetch_factor=64,
drop_last=False,
)
fout = open("./output.jsonl", "w")
for batch_id, meta_data in enumerate(dataloader):
for sample in meta_data:
if sample is not None:
fout.write(json.dumps(sample, ensure_ascii=False)+"\n")
if __name__ == "__main__":
main()
运行命令:
lmdeploy serve api_server ./pretrained_vlm/Qwen2.5-VL-72B-Instruct --server-port 23333 --tp 4 --cache-max-entry-count 0.4
@zhyxun 我这里没办法复现。能提供一张确定能复现的图片数据吗?还有大约跑多少数据会发生错误?
@grimoire 试试这样图片呢?跑大概五六分钟就会出现错误,有时候甚至更短,两三分钟就会报错
你跑着没问题吗?
@zhyxun https://github.com/InternLM/lmdeploy/pull/3307 试试看
@github-eliviate @grimoire 这个问题解决了吗?我今天安装了最新的lmdeply==0.7.2,调用QwenVL2-5_72B,遇到了同样的问题。
@github-eliviate @grimoire 这个问题解决了吗?我今天安装了最新的lmdeply==0.7.2,调用QwenVL2-5_72B,遇到了同样的问题。
安装之后这个错误还没出现过,研发的调用时服务端会报另一个错误
lmdeploy - ERROR - async_engine.py:592 - [safe_run] exception caught: GeneratorExit
不过这个错误不会引起服务端退出
lmdeploy - ERROR - async_engine.py:592 - [safe_run] exception caught: GeneratorExit
这个一般是请求连接相关的错,和引擎关系不大,@AllentDan 能不能帮忙看下
lmdeploy - ERROR - async_engine.py:592 - [safe_run] exception caught: GeneratorExit
这个一般是客户端请求中断可能会产生,不影响服务
lmdeploy - ERROR - async_engine.py:592 - [safe_run] exception caught: GeneratorExit
这个一般是客户端请求中断可能会产生,不影响服务
的确不影响服务,研发反馈同样的代码,有时候可以返回,有时候不能返回,区别只是提交的内容不同,并没有主动中断请求的情况
我也遇到了这个问题,使用 Qwen2.5VL 32B 时
@github-eliviate @natsunoshion 是否使用了 proxy 功能?如果连接超时,是有可能中断的。
内容不同
内容不同的话可以贴一个例子?