CUDA error when loading mixtral model

Open hxdtest opened this issue 2 years ago • 0 comments

(RayWorkerVllm pid=7009)   warnings.warn("Initializing zero-element tensors is a no-op")
INFO 01-03 15:52:10 llm_engine.py:223] # GPU blocks: 86934, # CPU blocks: 8192
(RayWorkerVllm pid=7009) INFO 01-03 15:52:13 model_runner.py:394] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
(RayWorkerVllm pid=7009) [E ProcessGroupNCCL.cpp:915] [Rank 0] NCCL watchdog thread terminated with exception: CUDA error: operation not permitted when stream is capturing
(RayWorkerVllm pid=7009) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(RayWorkerVllm pid=7009) For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
(RayWorkerVllm pid=7009) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Jan 03 '24 08:01 hxdtest