[Bug]: NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Your current environment
vllm 0.4.0.post1 docker image
how ran:
docker run -d \
--runtime=nvidia \
--gpus '"device=0,1"' \
--shm-size=10.24gb \
-p 5002:5002 \
-e NCCL_IGNORE_DISABLED_P2P=1 \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u `id -u`:`id -g` \
-v "${HOME}"/.cache:/home/ubuntu/.cache/ -v "${HOME}"/.config:/home/ubuntu/.config/ -v "${HOME}"/.config:/home/ubuntu/.triton/ \
--network host \
vllm/vllm-openai:latest \
--port=5002 \
--host=0.0.0.0 \
--model=mistralai/Mixtral-8x7B-Instruct-v0.1 \
--seed 1234 \
--trust-remote-code \
--tensor-parallel-size=2 \
--dtype auto \
--max-num-batched-tokens 131072 \
--max-log-len=100 \
--download-dir=/home/ubuntu/.cache/huggingface/hub &>> logs.vllm_server.2gpus.mixtral.txt
On:
Collecting environment information...
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-97-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.3.107
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100 80GB PCIe
GPU 1: NVIDIA A100 80GB PCIe
GPU 2: NVIDIA A100 80GB PCIe
GPU 3: NVIDIA A100 80GB PCIe
Nvidia driver version: 535.161.07
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 126
On-line CPU(s) list: 0-125
Thread(s) per core: 1
Core(s) per socket: 126
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 1
Model name: AMD EPYC 7763 64-Core Processor
Stepping: 1
CPU MHz: 2445.406
BogoMIPS: 4890.81
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 7.9 MiB
L1i cache: 7.9 MiB
L2 cache: 63 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-125
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_save umip pku ospke vaes vpclmulqdq rdpid fsrm arch_capabilities
Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] torch==2.1.2
[pip3] triton==2.1.0
[conda] numpy 1.26.3 pypi_0 pypi
[conda] torch 2.1.2 pypi_0 pypi
[conda] triton 2.1.0 pypi_0 pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.2.7
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PHB PHB PHB 0-125 0 N/A
GPU1 PHB X PHB PHB 0-125 0 N/A
GPU2 PHB PHB X PHB 0-125 0 N/A
GPU3 PHB PHB PHB X 0-125 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
🐛 Describe the bug
After 5 days of being up, eventually hit this. Note the endpoint was heavily used for all 5 days, nothing special apart from maybe more guided_json stuff today.
NFO 04-16 00:03:23 metrics.py:218] Avg prompt throughput: 2824.3 tokens/s, Avg generation throughput: 0.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 10.2%, CPU KV cache usage: 0.0%
[36m(RayWorkerVllm pid=7046)[0m [E ProcessGroupNCCL.cpp:916] [Rank 1] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
[36m(RayWorkerVllm pid=7046)[0m CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[36m(RayWorkerVllm pid=7046)[0m For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[36m(RayWorkerVllm pid=7046)[0m Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[36m(RayWorkerVllm pid=7046)[0m
[36m(RayWorkerVllm pid=7046)[0m Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
[36m(RayWorkerVllm pid=7046)[0m frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5144192617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
[36m(RayWorkerVllm pid=7046)[0m frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f514414d98d in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
[36m(RayWorkerVllm pid=7046)[0m frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5144530128 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f4da9f2f250 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f4da9f33078 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x250 (0x7f4da9f49910 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f4da9f49c18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #7: <unknown function> + 0xdc253 (0x7f5149847253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
[36m(RayWorkerVllm pid=7046)[0m frame #8: <unknown function> + 0x94ac3 (0x7f514b686ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[36m(RayWorkerVllm pid=7046)[0m frame #9: clone + 0x44 (0x7f514b717a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[36m(RayWorkerVllm pid=7046)[0m
[36m(RayWorkerVllm pid=7046)[0m [2024-04-16 00:03:25,069 E 7046 7269] logging.cc:97: Unhandled exception: St13runtime_error. what(): [Rank 1] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
[36m(RayWorkerVllm pid=7046)[0m CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[36m(RayWorkerVllm pid=7046)[0m For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[36m(RayWorkerVllm pid=7046)[0m Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[36m(RayWorkerVllm pid=7046)[0m
[36m(RayWorkerVllm pid=7046)[0m Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
[36m(RayWorkerVllm pid=7046)[0m frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5144192617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
[36m(RayWorkerVllm pid=7046)[0m frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f514414d98d in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
[36m(RayWorkerVllm pid=7046)[0m frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5144530128 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f4da9f2f250 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f4da9f33078 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x250 (0x7f4da9f49910 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f4da9f49c18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #7: <unknown function> + 0xdc253 (0x7f5149847253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
[36m(RayWorkerVllm pid=7046)[0m frame #8: <unknown function> + 0x94ac3 (0x7f514b686ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[36m(RayWorkerVllm pid=7046)[0m frame #9: clone + 0x44 (0x7f514b717a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[36m(RayWorkerVllm pid=7046)[0m
[36m(RayWorkerVllm pid=7046)[0m [E ProcessGroupNCCL.cpp:916] [Rank 1] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
[36m(RayWorkerVllm pid=7046)[0m CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[36m(RayWorkerVllm pid=7046)[0m For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[36m(RayWorkerVllm pid=7046)[0m Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[36m(RayWorkerVllm pid=7046)[0m
[36m(RayWorkerVllm pid=7046)[0m Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
[36m(RayWorkerVllm pid=7046)[0m frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5144192617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
[36m(RayWorkerVllm pid=7046)[0m frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f514414d98d in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
[36m(RayWorkerVllm pid=7046)[0m frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5144530128 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f4da9f2f250 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f4da9f33078 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x250 (0x7f4da9f49910 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f4da9f49c18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #7: <unknown function> + 0xdc253 (0x7f5149847253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
[36m(RayWorkerVllm pid=7046)[0m frame #8: <unknown function> + 0x94ac3 (0x7f514b686ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[36m(RayWorkerVllm pid=7046)[0m frame #9: clone + 0x44 (0x7f514b717a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[36m(RayWorkerVllm pid=7046)[0m
[36m(RayWorkerVllm pid=7046)[0m [2024-04-16 00:03:25,071 E 7046 7284] logging.cc:97: Unhandled exception: St13runtime_error. what(): [Rank 1] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
[36m(RayWorkerVllm pid=7046)[0m CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[36m(RayWorkerVllm pid=7046)[0m For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[36m(RayWorkerVllm pid=7046)[0m Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[36m(RayWorkerVllm pid=7046)[0m
[36m(RayWorkerVllm pid=7046)[0m Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
[36m(RayWorkerVllm pid=7046)[0m frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5144192617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
[36m(RayWorkerVllm pid=7046)[0m frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f514414d98d in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
[36m(RayWorkerVllm pid=7046)[0m frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5144530128 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f4da9f2f250 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f4da9f33078 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x250 (0x7f4da9f49910 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f4da9f49c18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
[36m(RayWorkerVllm pid=7046)[0m frame #7: <unknown function> + 0xdc253 (0x7f5149847253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
[36m(RayWorkerVllm pid=7046)[0m frame #8: <unknown function> + 0x94ac3 (0x7f514b686ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[36m(RayWorkerVllm pid=7046)[0m frame #9: clone + 0x44 (0x7f514b717a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[36m(RayWorkerVllm pid=7046)[0m
[36m(RayWorkerVllm pid=7046)[0m [2024-04-16 00:03:25,080 E 7046 7269] logging.cc:104: Stack trace:
[36m(RayWorkerVllm pid=7046)[0m /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0xfe543a) [0x7f514a97c43a] ray::operator<<()
[36m(RayWorkerVllm pid=7046)[0m /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0xfe7b78) [0x7f514a97eb78] ray::TerminateHandler()
[36m(RayWorkerVllm pid=7046)[0m /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f514981920c]
[36m(RayWorkerVllm pid=7046)[0m /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f5149819277]
[36m(RayWorkerVllm pid=7046)[0m /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae1fe) [0x7f51498191fe]
[36m(RayWorkerVllm pid=7046)[0m /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0xc86f5b) [0x7f4da9cb4f5b] c10d::ProcessGroupNCCL::ncclCommWatchdog()
[36m(RayWorkerVllm pid=7046)[0m /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f5149847253]
[36m(RayWorkerVllm pid=7046)[0m /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f514b686ac3]
[36m(RayWorkerVllm pid=7046)[0m /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7f514b717a04] __clone
[36m(RayWorkerVllm pid=7046)[0m
[36m(RayWorkerVllm pid=7046)[0m *** SIGABRT received at time=1713225805 on cpu 21 ***
[36m(RayWorkerVllm pid=7046)[0m PC: @ 0x7f514b6889fc (unknown) pthread_kill
[36m(RayWorkerVllm pid=7046)[0m @ 0x7f514b634520 (unknown) (unknown)
[36m(RayWorkerVllm pid=7046)[0m [2024-04-16 00:03:25,080 E 7046 7269] logging.cc:361: *** SIGABRT received at time=1713225805 on cpu 21 ***
[36m(RayWorkerVllm pid=7046)[0m [2024-04-16 00:03:25,080 E 7046 7269] logging.cc:361: PC: @ 0x7f514b6889fc (unknown) pthread_kill
[36m(RayWorkerVllm pid=7046)[0m [2024-04-16 00:03:25,080 E 7046 7269] logging.cc:361: @ 0x7f514b634520 (unknown) (unknown)
[36m(RayWorkerVllm pid=7046)[0m Fatal Python error: Aborted
[36m(RayWorkerVllm pid=7046)[0m
[36m(RayWorkerVllm pid=7046)[0m
[36m(RayWorkerVllm pid=7046)[0m Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, charset_normalizer.md, simplejson._speedups, uvloop.loop, ray._raylet, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, sentencepiece._sentencepiece, pyarrow.lib, pyarrow._hdfsio, pyarrow._json, PIL._imaging, __triton_launcher, cuda_utils (total: 37)
[E ProcessGroupNCCL.cpp:916] [Rank 0] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f48b76de617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f48b769998d in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f48b779a128 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f48432c5250 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f48432c9078 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x250 (0x7f48432df910 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f48432dfc18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdc253 (0x7f4887ab0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x94ac3 (0x7f48c3b30ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f48c3bc1a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[2024-04-16 00:03:25,191 E 1 7270] logging.cc:97: Unhandled exception: St13runtime_error. what(): [Rank 0] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f48b76de617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f48b769998d in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f48b779a128 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f48432c5250 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f48432c9078 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x250 (0x7f48432df910 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f48432dfc18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdc253 (0x7f4887ab0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x94ac3 (0x7f48c3b30ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f48c3bc1a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[E ProcessGroupNCCL.cpp:916] [Rank 0] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f48b76de617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f48b769998d in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f48b779a128 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f48432c5250 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f48432c9078 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x250 (0x7f48432df910 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f48432dfc18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdc253 (0x7f4887ab0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x94ac3 (0x7f48c3b30ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f48c3bc1a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[2024-04-16 00:03:25,207 E 1 7285] logging.cc:97: Unhandled exception: St13runtime_error. what(): [Rank 0] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f48b76de617 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f48b769998d in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f48b779a128 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f48432c5250 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f48432c9078 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x250 (0x7f48432df910 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f48432dfc18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdc253 (0x7f4887ab0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x94ac3 (0x7f48c3b30ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f48c3bc1a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
ERROR 04-16 00:03:25 async_llm_engine.py:43] Engine background task failed
ERROR 04-16 00:03:25 async_llm_engine.py:43] Traceback (most recent call last):
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
ERROR 04-16 00:03:25 async_llm_engine.py:43] task.result()
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/engine/async_llm_engine.py", line 479, in run_engine_loop
ERROR 04-16 00:03:25 async_llm_engine.py:43] has_requests_in_progress = await asyncio.wait_for(
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
ERROR 04-16 00:03:25 async_llm_engine.py:43] return fut.result()
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/engine/async_llm_engine.py", line 453, in engine_step
ERROR 04-16 00:03:25 async_llm_engine.py:43] request_outputs = await self.engine.step_async()
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/engine/async_llm_engine.py", line 213, in step_async
ERROR 04-16 00:03:25 async_llm_engine.py:43] output = await self.model_executor.execute_model_async(
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/executor/ray_gpu_executor.py", line 422, in execute_model_async
ERROR 04-16 00:03:25 async_llm_engine.py:43] all_outputs = await self._run_workers_async(
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/executor/ray_gpu_executor.py", line 412, in _run_workers_async
ERROR 04-16 00:03:25 async_llm_engine.py:43] all_outputs = await asyncio.gather(*coros)
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
ERROR 04-16 00:03:25 async_llm_engine.py:43] result = self.fn(*self.args, **self.kwargs)
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 04-16 00:03:25 async_llm_engine.py:43] return func(*args, **kwargs)
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/worker/worker.py", line 221, in execute_model
ERROR 04-16 00:03:25 async_llm_engine.py:43] output = self.model_runner.execute_model(seq_group_metadata_list,
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 04-16 00:03:25 async_llm_engine.py:43] return func(*args, **kwargs)
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/worker/model_runner.py", line 673, in execute_model
ERROR 04-16 00:03:25 async_llm_engine.py:43] output = self.model.sample(
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/model_executor/models/mixtral.py", line 394, in sample
ERROR 04-16 00:03:25 async_llm_engine.py:43] next_tokens = self.sampler(logits, sampling_metadata)
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
ERROR 04-16 00:03:25 async_llm_engine.py:43] return self._call_impl(*args, **kwargs)
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
ERROR 04-16 00:03:25 async_llm_engine.py:43] return forward_call(*args, **kwargs)
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/model_executor/layers/sampler.py", line 76, in forward
ERROR 04-16 00:03:25 async_llm_engine.py:43] sample_results = _sample(probs, logprobs, sampling_metadata,
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/model_executor/layers/sampler.py", line 502, in _sample
ERROR 04-16 00:03:25 async_llm_engine.py:43] return _sample_with_torch(probs, logprobs, sampling_metadata)
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/model_executor/layers/sampler.py", line 399, in _sample_with_torch
ERROR 04-16 00:03:25 async_llm_engine.py:43] sample_results = _greedy_sample(seq_groups, greedy_samples)
ERROR 04-16 00:03:25 async_llm_engine.py:43] File "/workspace/vllm/model_executor/layers/sampler.py", line 214, in _greedy_sample
ERROR 04-16 00:03:25 async_llm_engine.py:43] samples = samples.tolist()
ERROR 04-16 00:03:25 async_llm_engine.py:43] RuntimeError: CUDA error: an illegal memory access was encountered
ERROR 04-16 00:03:25 async_llm_engine.py:43] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 04-16 00:03:25 async_llm_engine.py:43] For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR 04-16 00:03:25 async_llm_engine.py:43] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 04-16 00:03:25 async_llm_engine.py:43]
ERROR:asyncio:Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f47751041f0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f479b4fc910>>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7f47751041f0>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f479b4fc910>>)>
Traceback (most recent call last):
File "/workspace/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
task.result()
File "/workspace/vllm/engine/async_llm_engine.py", line 479, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
File "/workspace/vllm/engine/async_llm_engine.py", line 453, in engine_step
request_outputs = await self.engine.step_async()
File "/workspace/vllm/engine/async_llm_engine.py", line 213, in step_async
output = await self.model_executor.execute_model_async(
File "/workspace/vllm/executor/ray_gpu_executor.py", line 422, in execute_model_async
all_outputs = await self._run_workers_async(
File "/workspace/vllm/executor/ray_gpu_executor.py", line 412, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/workspace/vllm/worker/worker.py", line 221, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/workspace/vllm/worker/model_runner.py", line 673, in execute_model
output = self.model.sample(
File "/workspace/vllm/model_executor/models/mixtral.py", line 394, in sample
next_tokens = self.sampler(logits, sampling_metadata)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/vllm/model_executor/layers/sampler.py", line 76, in forward
sample_results = _sample(probs, logprobs, sampling_metadata,
File "/workspace/vllm/model_executor/layers/sampler.py", line 502, in _sample
return _sample_with_torch(probs, logprobs, sampling_metadata)
File "/workspace/vllm/model_executor/layers/sampler.py", line 399, in _sample_with_torch
sample_results = _greedy_sample(seq_groups, greedy_samples)
File "/workspace/vllm/model_executor/layers/sampler.py", line 214, in _greedy_sample
samples = samples.tolist()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/workspace/vllm/engine/async_llm_engine.py", line 45, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
INFO 04-16 00:03:25 async_llm_engine.py:154] Aborted request cmpl-dfc7112541c14e93b9996e354d51fe7e-0.
INFO 04-16 00:03:25 async_llm_engine.py:154] Aborted request cmpl-7198ad674747410698402a13a1000014-0.
INFO 04-16 00:03:25 async_llm_engine.py:154] Aborted request cmpl-7f59bec40147410fbd3598b48c7c3d09-0.
INFO 04-16 00:03:25 async_llm_engine.py:154] Aborted request cmpl-834ce903dfa0491b9bc94e76acc1bb02-0.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 265, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 261, in wrap
await func()
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 238, in listen_for_disconnect
message = await receive()
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
await self.message_event.wait()
File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f37e4599390
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 75, in app
await response(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 258, in __call__
async with anyio.create_task_group() as task_group:
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 265, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 261, in wrap
await func()
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 238, in listen_for_disconnect
message = await receive()
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
await self.message_event.wait()
File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f37dc1f5570
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 75, in app
await response(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 258, in __call__
async with anyio.create_task_group() as task_group:
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 265, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 261, in wrap
await func()
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 238, in listen_for_disconnect
message = await receive()
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
await self.message_event.wait()
File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f37dc1f5960
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 75, in app
await response(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 258, in __call__
async with anyio.create_task_group() as task_group:
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 265, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 261, in wrap
await func()
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 238, in listen_for_disconnect
message = await receive()
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
await self.message_event.wait()
File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f37e4703190
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 75, in app
await response(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 258, in __call__
async with anyio.create_task_group() as task_group:
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
[2024-04-16 00:03:25,217 E 1 7270] logging.cc:104: Stack trace:
/usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0xfe543a) [0x7f47749a243a] ray::operator<<()
/usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0xfe7b78) [0x7f47749a4b78] ray::TerminateHandler()
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f4887a8220c]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f4887a82277]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae1fe) [0x7f4887a821fe]
/usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0xc86f5b) [0x7f484304af5b] c10d::ProcessGroupNCCL::ncclCommWatchdog()
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f4887ab0253]
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f48c3b30ac3]
/usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7f48c3bc1a04] __clone
*** SIGABRT received at time=1713225805 on cpu 77 ***
PC: @ 0x7f48c3b329fc (unknown) pthread_kill
@ 0x7f48c3ade520 (unknown) (unknown)
[2024-04-16 00:03:25,217 E 1 7270] logging.cc:361: *** SIGABRT received at time=1713225805 on cpu 77 ***
[2024-04-16 00:03:25,217 E 1 7270] logging.cc:361: PC: @ 0x7f48c3b329fc (unknown) pthread_kill
[2024-04-16 00:03:25,217 E 1 7270] logging.cc:361: @ 0x7f48c3ade520 (unknown) (unknown)
Fatal Python error: Aborted
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, simplejson._speedups, yaml._yaml, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, regex._regex, scipy._lib._ccallback_c, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, markupsafe._speedups, pyarrow.lib, pyarrow._hdfsio, pyarrow._json, PIL._imaging, __triton_launcher, cuda_utils, httptools.parser.parser, httptools.parser.url_parser, websockets.speedups, _cffi_backend, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering (total: 77)
[failure_signal_handler.cc : 332] RAW: Signal 11 raised at PC=0x7f48c3ac4898 while already in AbslFailureSignalHandler()
*** SIGSEGV received at time=1713225805 on cpu 77 ***
PC: @ 0x7f48c3ac4898 (unknown) abort
@ 0x7f48c3ade520 (unknown) (unknown)
[2024-04-16 00:03:25,219 E 1 7285] logging.cc:104: Stack trace:
/usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0xfe543a) [0x7f47749a243a] ray::operator<<()
/usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0xfe7b78) [0x7f47749a4b78] ray::TerminateHandler()
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f4887a8220c]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f4887a82277]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae1fe) [0x7f4887a821fe]
/usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0xc86f5b) [0x7f484304af5b] c10d::ProcessGroupNCCL::ncclCommWatchdog()
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f4887ab0253]
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f48c3b30ac3]
/usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7f48c3bc1a04] __clone
@ 0x7f46e4c14640 (unknown) (unknown)
[2024-04-16 00:03:25,221 E 1 7270] logging.cc:361: *** SIGSEGV received at time=1713225805 on cpu 77 ***
[2024-04-16 00:03:25,221 E 1 7270] logging.cc:361: PC: @ 0x7f48c3ac4898 (unknown) abort
[2024-04-16 00:03:25,221 E 1 7270] logging.cc:361: @ 0x7f48c3ade520 (unknown) (unknown)
[2024-04-16 00:03:25,223 E 1 7270] logging.cc:361: @ 0x7f46e4c14640 (unknown) (unknown)
Fatal Python error: Segmentation fault
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, simplejson._speedups, yaml._yaml, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, regex._regex, scipy._lib._ccallback_c, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, markupsafe._speedups, pyarrow.lib, pyarrow._hdfsio, pyarrow._json, PIL._imaging, __triton_launcher, cuda_utils, httptools.parser.parser, httptools.parser.url_parser, websockets.speedups, _cffi_backend, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering (total: 77)
Experiencing the same when using LoRa requests...
Experiencing the same when using LoRa requests...
Hi! The same with you when using LoRa. Do you have a solution?
When I load the llama model, some GPU will do this and others will be fine
I'm also seeing same issues on a clean server installation in GCP. My steps to reproduce were:
- run instance in google using c0-deeplearning-common-cu121-v20240417-debian-11 image and 2xA100 40GB GPU
- login, it asks for install driver, accept
- check nvidia-smi - driver installed successfully
- now I'm in clean environment with conda (base)
- pip install vllm
- optionally: pip install flash-attn
- run vllm open api server (I used code llama model)
- Got:
CUDA error: an illegal memory access was encountered
Still seeing this on Mixtral
INFO: 172.16.0.88:2118 - "POST /v1/completions HTTP/1.1" 200 OK
[rank0]:[E ProcessGroupNCCL.cpp:1414] [PG 0 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7c5ec7d7a897 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7c5ec7d2ab25 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7c5ec818b718 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7c5e7ba4ae36 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7c5e7ba4ef38 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x77c (0x7c5e7ba545ac in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7c5e7ba5531c in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdc253 (0x7c5ec74b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x94ac3 (0x7c5ec8a92ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7c5ec8b23a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[rank0]:[E ProcessGroupNCCL.cpp:1414] [PG 1 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7c5ec7d7a897 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7c5ec7d2ab25 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7c5ec818b718 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7c5e7ba4ae36 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7c5e7ba4ef38 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x77c (0x7c5e7ba545ac in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7c5e7ba5531c in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdc253 (0x7c5ec74b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x94ac3 (0x7c5ec8a92ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7c5ec8b23a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[2024-05-17 07:35:09,516 E 1 6539] logging.cc:101: Unhandled exception: N3c1016DistBackendErrorE. what(): [PG 1 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
:
Still see on totally different H100 system
same problem here with H100s and latest vllm==0.4.2
@pseudotensor I have discovered an integer overflow in the fused_moe_kernel, a Triton kernel called by MoE models. The overflow will sometimes cause CUDA illegal memory access issues. I don't know if this overflow is the cause of your failure, but since you are using the Mixtral model (a MoE), you might be affected. If you'd like to check, you can add the following assertion here:
tl.device_assert(off_experts * stride_be >= 0, "off_experts * stride_be overflows!")
and then rerun your program with the following envs (should be set in the docker)CUDA_LAUNCH_BLOCKING=1 TRITON_DEBUG=1 set, and with the flag --enforce-eager passed to the docker entrypoint?
Same problem here when running llama-7b with input_len >= 4096 tensor_parallel_size > 1, on a800 * 8. Did anyone solve it?
@pseudotensor I have discovered an integer overflow in the
fused_moe_kernel, a Triton kernel called by MoE models. The overflow will sometimes cause CUDA illegal memory access issues. I don't know if this overflow is the cause of your failure, but since you are using the Mixtral model (a MoE), you might be affected. If you'd like to check, you can add the following assertion here:tl.device_assert(off_experts * stride_be >= 0, "off_experts * stride_be overflows!")and then rerun your program with the following envs (should be set in the docker)
CUDA_LAUNCH_BLOCKING=1 TRITON_DEBUG=1set, and with the flag--enforce-eagerpassed to the docker entrypoint?
same error occurs, did you solve it, or how to skip this ...
Still seeing this, only when using LoRa. I am currently using LLama3-8b, tensor_parallel_size=8 and max_model_len=1250. The same run without using LoRa works flawlessly.
This might be related: https://stackoverflow.com/questions/68106457/pytorch-cuda-error-an-illegal-memory-access-was-encountered
The root problem could be OOM because of the prefix caching. The solution in the post above is to use torch.cuda.empty_cache() so it would make sense
Closing just because original mixtral model no longer has been doing this after 0.4.3+.