vllm
vllm copied to clipboard
[Bug]: Shutdown during Qwen2.5-VL-72B inference on 4 A800s
Your current environment
The output of `python collect_env.py`
PyTorch version: 2.5.0+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.29.1
Libc version: glibc-2.31
Python version: 3.11.11 (main, Dec 11 2024, 16:28:39) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.18.0-348.7.1.el8_5.x86_64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 12.3.107
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A800-SXM4-80GB
GPU 1: NVIDIA A800-SXM4-80GB
GPU 2: NVIDIA A800-SXM4-80GB
GPU 3: NVIDIA A800-SXM4-80GB
GPU 4: NVIDIA A800-SXM4-80GB
GPU 5: NVIDIA A800-SXM4-80GB
GPU 6: NVIDIA A800-SXM4-80GB
GPU 7: NVIDIA A800-SXM4-80GB
Nvidia driver version: 535.161.08
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 57 bits virtual
CPU(s): 116
On-line CPU(s) list: 0-115
Thread(s) per core: 2
Core(s) per socket: 29
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8350C CPU @ 2.60GHz
Stepping: 6
CPU MHz: 2599.996
BogoMIPS: 5199.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 2.7 MiB
L1i cache: 1.8 MiB
L2 cache: 72.5 MiB
L3 cache: 96 MiB
NUMA node0 CPU(s): 0-57
NUMA node1 CPU(s): 58-115
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid md_clear arch_capabilities
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-cusparselt-cu12==0.6.2
[pip3] nvidia-ml-py==12.570.86
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pynvml==12.0.0
[pip3] pytorchvideo==0.1.5
[pip3] pyzmq==26.2.1
[pip3] sentence-transformers==3.4.1
[pip3] torch==2.5.0+cu124
[pip3] torchaudio==2.5.0+cu124
[pip3] torchvision==0.20.0+cu124
[pip3] transformers==4.49.0
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.1.0
[conda] numpy 1.26.4 pypi_0 pypi
[conda] nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.5.147 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi
[conda] nvidia-cusparselt-cu12 0.6.2 pypi_0 pypi
[conda] nvidia-ml-py 12.570.86 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.21.5 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.4.127 pypi_0 pypi
[conda] pynvml 12.0.0 pypi_0 pypi
[conda] pytorchvideo 0.1.5 pypi_0 pypi
[conda] pyzmq 26.2.1 pypi_0 pypi
[conda] sentence-transformers 3.4.1 pypi_0 pypi
[conda] torch 2.5.0+cu124 pypi_0 pypi
[conda] torchaudio 2.5.0+cu124 pypi_0 pypi
[conda] torchcodec 0.1.0 pypi_0 pypi
[conda] torchvision 0.20.0+cu124 pypi_0 pypi
[conda] transformers 4.49.0 pypi_0 pypi
[conda] transformers-stream-generator 0.0.5 pypi_0 pypi
[conda] triton 3.1.0 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.7.3
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV8 NV8 NV8 NV8 NV8 NV8 NV8 SYS PHB PHB PHB PHB SYS SYS SYS SYS 0-57 0 N/A
GPU1 NV8 X NV8 NV8 NV8 NV8 NV8 NV8 SYS PHB PHB PHB PHB SYS SYS SYS SYS 0-57 0 N/A
GPU2 NV8 NV8 X NV8 NV8 NV8 NV8 NV8 SYS PHB PHB PHB PHB SYS SYS SYS SYS 0-57 0 N/A
GPU3 NV8 NV8 NV8 X NV8 NV8 NV8 NV8 SYS PHB PHB PHB PHB SYS SYS SYS SYS 0-57 0 N/A
GPU4 NV8 NV8 NV8 NV8 X NV8 NV8 NV8 SYS SYS SYS SYS SYS PHB PHB PHB PHB 58-115 1 N/A
GPU5 NV8 NV8 NV8 NV8 NV8 X NV8 NV8 SYS SYS SYS SYS SYS PHB PHB PHB PHB 58-115 1 N/A
GPU6 NV8 NV8 NV8 NV8 NV8 NV8 X NV8 SYS SYS SYS SYS SYS PHB PHB PHB PHB 58-115 1 N/A
GPU7 NV8 NV8 NV8 NV8 NV8 NV8 NV8 X SYS SYS SYS SYS SYS PHB PHB PHB PHB 58-115 1 N/A
NIC0 SYS SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS SYS SYS SYS SYS
NIC1 PHB PHB PHB PHB SYS SYS SYS SYS SYS X PHB PHB PHB SYS SYS SYS SYS
NIC2 PHB PHB PHB PHB SYS SYS SYS SYS SYS PHB X PHB PHB SYS SYS SYS SYS
NIC3 PHB PHB PHB PHB SYS SYS SYS SYS SYS PHB PHB X PHB SYS SYS SYS SYS
NIC4 PHB PHB PHB PHB SYS SYS SYS SYS SYS PHB PHB PHB X SYS SYS SYS SYS
NIC5 SYS SYS SYS SYS PHB PHB PHB PHB SYS SYS SYS SYS SYS X PHB PHB PHB
NIC6 SYS SYS SYS SYS PHB PHB PHB PHB SYS SYS SYS SYS SYS PHB X PHB PHB
NIC7 SYS SYS SYS SYS PHB PHB PHB PHB SYS SYS SYS SYS SYS PHB PHB X PHB
NIC8 SYS SYS SYS SYS PHB PHB PHB PHB SYS SYS SYS SYS SYS PHB PHB PHB X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
NIC8: mlx5_8
NVIDIA_VISIBLE_DEVICES=GPU-461efaa0-904b-b9bc-1d9e-a8211ab74248,GPU-6b4eee32-4b9d-64d7-bc92-c7967a752cc5,GPU-40e681c0-37a8-6846-bc72-6da1ae62ea6d,GPU-2bb0e965-8e04-c6ce-4f84-e536b13d378c,GPU-50e841f5-96a3-a386-d38b-9332a4ffe2b4,GPU-eac870db-86c9-979b-8560-906489457607,GPU-03bce450-81a1-5f7a-cab2-07453226d367,GPU-e47c9312-5289-a447-efeb-888130c4c520
NVIDIA_REQUIRE_CUDA=cuda>=12.3 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526 brand=tesla,driver>=535,driver<536 brand=unknown,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=geforce,driver>=535,driver<536 brand=geforcertx,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=titan,driver>=535,driver<536 brand=titanrtx,driver>=535,driver<536
NCCL_VERSION=2.20.5-1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NVIDIA_PRODUCT_NAME=CUDA
CUDA_VERSION=12.3.2
LD_LIBRARY_PATH=/home/zhangzhicheng03/anaconda3/envs/videva/lib/python3.11/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NCCL_IB_DISABLE=1
NCCL_CUMEM_ENABLE=0
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
🐛 Describe the bug
Directly shutdown after outputting NCCL info level message.
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Can you follow https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html to get more detailed logs? cc @youkaichao
Can you follow https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html to get more detailed logs? cc @youkaichao
Thx for the reply, and here is the logging.
run sh: /home/zhangzhicheng03/anaconda3/envs/videva/bin/python /home/zhangzhicheng03/code/face-llm/ms-swift/swift/cli/infer.py --ckpt_dir /home/zhangzhicheng03/HuggingFace/VideoLLM/models--Qwen--Qwen2.5-VL-72B-Instruct --infer_backend vllm --val_dataset /home/zhangzhicheng03/code/face-llm/all_anno_clean_v2/QA_train_split/QA_training_21.json --gpu_memory_utilization 0.8 --torch_dtype bfloat16 --max_new_tokens 2048 --max-num-seqs 16 --streaming False --max_batch_size 8 --tensor_parallel_size 4 --result_path /home/zhangzhicheng03/code/face-llm/qwenvl/QA_ver_res_train/QA_training_21.json --attn_impl flash_attn --limit_mm_per_prompt {"image": 0, "video": 1} --max_model_len 32768 --model_type qwen2_5_vl
[INFO:swift] Successfully registered /home/zhangzhicheng03/code/face-llm/ms-swift/swift/llm/dataset/data/dataset_info.json
[WARNING:swift] The --ckpt_dir parameter will be removed in ms-swift>=3.2. Please use --model, --adapters.
[INFO:swift] rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
[INFO:swift] Loading the model using model_dir: /home/zhangzhicheng03/HuggingFace/VideoLLM/models--Qwen--Qwen2.5-VL-72B-Instruct
[INFO:swift] Because len(args.val_dataset) > 0, setting split_dataset_ratio: 0.0
[INFO:swift] Setting args.eval_human: False
[INFO:swift] Global seed set to 42
[INFO:swift] args: InferArguments(model='/home/zhangzhicheng03/HuggingFace/VideoLLM/models--Qwen--Qwen2.5-VL-72B-Instruct', model_type='qwen2_5_vl', model_revision=None, task_type='causal_lm', torch_dtype=torch.bfloat16, attn_impl='flash_attn', num_labels=None, rope_scaling=None, device_map=None, local_repo_path=None, template='qwen2_5_vl', system=None, max_length=None, truncation_strategy='delete', max_pixels=None, tools_prompt='react_en', norm_bbox=None, padding_side='right', loss_scale='default', sequence_parallel_size=1, use_chat_template=True, template_backend='swift', dataset=[], val_dataset=['/home/zhangzhicheng03/code/face-llm/all_anno_clean_v2/QA_train_split/QA_training_21.json'], split_dataset_ratio=0.0, data_seed=42, dataset_num_proc=1, streaming=False, enable_cache=False, download_mode='reuse_dataset_if_exists', columns={}, strict=False, remove_unused_columns=True, model_name=[None, None], model_author=[None, None], custom_dataset_info=[], quant_method=None, quant_bits=None, hqq_axis=None, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=2048, temperature=None, top_k=None, top_p=None, repetition_penalty=None, num_beams=1, stream=False, stop_words=[], logprobs=False, top_logprobs=None, ckpt_dir=None, load_dataset_config=None, lora_modules=[], tuner_backend='peft', train_type='lora', adapters=[], seed=42, model_kwargs={}, load_args=True, load_data_args=False, use_hf=False, hub_token=None, custom_register_path=[], ignore_args_error=False, use_swift_lora=False, tp=1, session_len=None, cache_max_entry_count=0.8, quant_policy=0, vision_batch_size=1, gpu_memory_utilization=0.8, tensor_parallel_size=4, pipeline_parallel_size=1, max_num_seqs=16, max_model_len=32768, disable_custom_all_reduce=False, enforce_eager=False, limit_mm_per_prompt={'image': 0, 'video': 1}, vllm_max_lora_rank=16, enable_prefix_caching=False, merge_lora=False, safe_serialization=True, max_shard_size='5GB', infer_backend='vllm', result_path='/home/zhangzhicheng03/code/face-llm/qwenvl/QA_ver_res_train/QA_training_21.json', metric=None, max_batch_size=8, ddp_backend=None, val_dataset_sample=None)
[INFO:swift] Loading the model using model_dir: /home/zhangzhicheng03/HuggingFace/VideoLLM/models--Qwen--Qwen2.5-VL-72B-Instruct
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
[INFO:swift] Setting image_factor: 28. You can adjust this hyperparameter through the environment variable: IMAGE_FACTOR.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: MIN_PIXELS.
[INFO:swift] Setting max_pixels: 12845056. You can adjust this hyperparameter through the environment variable: MAX_PIXELS.
[INFO:swift] Setting max_ratio: 200. You can adjust this hyperparameter through the environment variable: MAX_RATIO.
[INFO:swift] Setting video_min_pixels: 100352. You can adjust this hyperparameter through the environment variable: VIDEO_MIN_PIXELS.
[INFO:swift] Using environment variable VIDEO_MAX_PIXELS, Setting video_max_pixels: 100352.
[INFO:swift] Setting video_total_pixels: 100352. You can adjust this hyperparameter through the environment variable: VIDEO_TOTAL_PIXELS.
[INFO:swift] Setting frame_factor: 2. You can adjust this hyperparameter through the environment variable: FRAME_FACTOR.
[INFO:swift] Setting fps: 2.0. You can adjust this hyperparameter through the environment variable: FPS.
[INFO:swift] Setting fps_min_frames: 4. You can adjust this hyperparameter through the environment variable: FPS_MIN_FRAMES.
[INFO:swift] Using environment variable FPS_MAX_FRAMES, Setting fps_max_frames: 16.
DEBUG 04-27 03:13:24 init.py:28] No plugins for group vllm.platform_plugins found.
INFO 04-27 03:13:24 init.py:207] Automatically detected platform cuda.
DEBUG 04-27 03:13:24 init.py:28] No plugins for group vllm.general_plugins found.
INFO 04-27 03:13:30 config.py:549] This model supports multiple tasks: {'generate', 'classify', 'score', 'embed', 'reward'}. Defaulting to 'generate'.
INFO 04-27 03:13:30 config.py:1382] Defaulting to use mp for distributed inference
INFO 04-27 03:13:30 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.3) with config: model='/home/zhangzhicheng03/HuggingFace/VideoLLM/models--Qwen--Qwen2.5-VL-72B-Instruct', speculative_config=None, tokenizer='/home/zhangzhicheng03/HuggingFace/VideoLLM/models--Qwen--Qwen2.5-VL-72B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/home/zhangzhicheng03/HuggingFace/VideoLLM/models--Qwen--Qwen2.5-VL-72B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[16,8,4,2,1],"max_capture_size":16}, use_cached_outputs=False,
WARNING 04-27 03:13:30 multiproc_worker_utils.py:300] Reducing Torch parallelism from 58 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 04-27 03:13:30 custom_cache_manager.py:19] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
WARNING 04-27 03:13:30 logger.py:202] VLLM_TRACE_FUNCTION is enabled. It will record every function executed by Python. This will slow down the code. It is suggested to be used for debugging hang or crashes only.
INFO 04-27 03:13:30 logger.py:206] Trace frame log is saved to /tmp/root/vllm/vllm-instance-ed9df/VLLM_TRACE_FUNCTION_for_process_47298_thread_140669624403200_at_2025-04-27_03:13:30.791136.log
[INFO:swift] Successfully registered /home/zhangzhicheng03/code/face-llm/ms-swift/swift/llm/dataset/data/dataset_info.json
[INFO:swift] Successfully registered /home/zhangzhicheng03/code/face-llm/ms-swift/swift/llm/dataset/data/dataset_info.json
[INFO:swift] Successfully registered /home/zhangzhicheng03/code/face-llm/ms-swift/swift/llm/dataset/data/dataset_info.json
DEBUG 04-27 03:13:36 init.py:28] No plugins for group vllm.platform_plugins found.
INFO 04-27 03:13:36 init.py:207] Automatically detected platform cuda.
DEBUG 04-27 03:13:36 init.py:28] No plugins for group vllm.platform_plugins found.
INFO 04-27 03:13:36 init.py:207] Automatically detected platform cuda.
(VllmWorkerProcess pid=47808) INFO 04-27 03:13:36 multiproc_worker_utils.py:229] Worker ready; awaiting tasks
(VllmWorkerProcess pid=47808) WARNING 04-27 03:13:36 logger.py:202] VLLM_TRACE_FUNCTION is enabled. It will record every function executed by Python. This will slow down the code. It is suggested to be used for debugging hang or crashes only.
(VllmWorkerProcess pid=47808) INFO 04-27 03:13:36 logger.py:206] Trace frame log is saved to /tmp/root/vllm/vllm-instance-ed9df/VLLM_TRACE_FUNCTION_for_process_47808_thread_140652367746304_at_2025-04-27_03:13:36.855844.log
(VllmWorkerProcess pid=47806) INFO 04-27 03:13:36 multiproc_worker_utils.py:229] Worker ready; awaiting tasks
(VllmWorkerProcess pid=47806) WARNING 04-27 03:13:36 logger.py:202] VLLM_TRACE_FUNCTION is enabled. It will record every function executed by Python. This will slow down the code. It is suggested to be used for debugging hang or crashes only.
(VllmWorkerProcess pid=47806) INFO 04-27 03:13:36 logger.py:206] Trace frame log is saved to /tmp/root/vllm/vllm-instance-ed9df/VLLM_TRACE_FUNCTION_for_process_47806_thread_139679837152512_at_2025-04-27_03:13:36.912702.log
DEBUG 04-27 03:13:36 init.py:28] No plugins for group vllm.platform_plugins found.
INFO 04-27 03:13:36 init.py:207] Automatically detected platform cuda.
(VllmWorkerProcess pid=47807) INFO 04-27 03:13:37 multiproc_worker_utils.py:229] Worker ready; awaiting tasks
(VllmWorkerProcess pid=47807) WARNING 04-27 03:13:37 logger.py:202] VLLM_TRACE_FUNCTION is enabled. It will record every function executed by Python. This will slow down the code. It is suggested to be used for debugging hang or crashes only.
(VllmWorkerProcess pid=47807) INFO 04-27 03:13:37 logger.py:206] Trace frame log is saved to /tmp/root/vllm/vllm-instance-ed9df/VLLM_TRACE_FUNCTION_for_process_47807_thread_140082128848128_at_2025-04-27_03:13:37.043274.log
(VllmWorkerProcess pid=47808) DEBUG 04-27 03:13:37 init.py:28] No plugins for group vllm.general_plugins found.
(VllmWorkerProcess pid=47806) DEBUG 04-27 03:13:37 init.py:28] No plugins for group vllm.general_plugins found.
INFO 04-27 03:13:37 cuda.py:229] Using Flash Attention backend.
DEBUG 04-27 03:13:37 config.py:3461] enabled custom ops: Counter()
DEBUG 04-27 03:13:37 config.py:3463] disabled custom ops: Counter()
(VllmWorkerProcess pid=47807) DEBUG 04-27 03:13:37 init.py:28] No plugins for group vllm.general_plugins found.
(VllmWorkerProcess pid=47808) INFO 04-27 03:13:43 cuda.py:229] Using Flash Attention backend.
(VllmWorkerProcess pid=47808) DEBUG 04-27 03:13:43 config.py:3461] enabled custom ops: Counter()
(VllmWorkerProcess pid=47808) DEBUG 04-27 03:13:43 config.py:3463] disabled custom ops: Counter()
(VllmWorkerProcess pid=47806) INFO 04-27 03:13:43 cuda.py:229] Using Flash Attention backend.
(VllmWorkerProcess pid=47806) DEBUG 04-27 03:13:43 config.py:3461] enabled custom ops: Counter()
(VllmWorkerProcess pid=47806) DEBUG 04-27 03:13:43 config.py:3463] disabled custom ops: Counter()
(VllmWorkerProcess pid=47807) INFO 04-27 03:13:44 cuda.py:229] Using Flash Attention backend.
(VllmWorkerProcess pid=47807) DEBUG 04-27 03:13:44 config.py:3461] enabled custom ops: Counter()
(VllmWorkerProcess pid=47807) DEBUG 04-27 03:13:44 config.py:3463] disabled custom ops: Counter()
DEBUG 04-27 03:13:44 parallel_state.py:810] world_size=4 rank=0 local_rank=0 distributed_init_method=tcp://10.252.128.175:65209 backend=nccl
(VllmWorkerProcess pid=47808) DEBUG 04-27 03:13:44 parallel_state.py:810] world_size=4 rank=3 local_rank=3 distributed_init_method=tcp://10.252.128.175:65209 backend=nccl
(VllmWorkerProcess pid=47806) DEBUG 04-27 03:13:44 parallel_state.py:810] world_size=4 rank=1 local_rank=1 distributed_init_method=tcp://10.252.128.175:65209 backend=nccl
(VllmWorkerProcess pid=47807) DEBUG 04-27 03:13:44 parallel_state.py:810] world_size=4 rank=2 local_rank=2 distributed_init_method=tcp://10.252.128.175:65209 backend=nccl
(VllmWorkerProcess pid=47808) INFO 04-27 03:13:45 utils.py:916] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=47808) INFO 04-27 03:13:45 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=47807) INFO 04-27 03:13:45 utils.py:916] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=47807) INFO 04-27 03:13:45 pynccl.py:69] vLLM is using nccl==2.21.5
INFO 04-27 03:13:45 utils.py:916] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=47806) INFO 04-27 03:13:45 utils.py:916] Found nccl from library libnccl.so.2
INFO 04-27 03:13:45 pynccl.py:69] vLLM is using nccl==2.21.5
(VllmWorkerProcess pid=47806) INFO 04-27 03:13:45 pynccl.py:69] vLLM is using nccl==2.21.5
a800bcctest0136-bd:47298:47298 [0] NCCL INFO Bootstrap : Using eth0:10.252.128.175<0>
a800bcctest0136-bd:47298:47298 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
a800bcctest0136-bd:47298:47298 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
a800bcctest0136-bd:47298:47298 [0] NCCL INFO NET/Plugin: Using internal network plugin.
a800bcctest0136-bd:47298:47298 [0] NCCL INFO cudaDriverVersion 12020
NCCL version 2.21.5+cuda12.4
a800bcctest0136-bd:47808:47808 [3] NCCL INFO cudaDriverVersion 12020
a800bcctest0136-bd:47807:47807 [2] NCCL INFO cudaDriverVersion 12020
a800bcctest0136-bd:47806:47806 [1] NCCL INFO cudaDriverVersion 12020
a800bcctest0136-bd:47808:47808 [3] NCCL INFO Bootstrap : Using eth0:10.252.128.175<0>
a800bcctest0136-bd:47807:47807 [2] NCCL INFO Bootstrap : Using eth0:10.252.128.175<0>
a800bcctest0136-bd:47806:47806 [1] NCCL INFO Bootstrap : Using eth0:10.252.128.175<0>
a800bcctest0136-bd:47808:47808 [3] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
a800bcctest0136-bd:47808:47808 [3] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
a800bcctest0136-bd:47808:47808 [3] NCCL INFO NET/Plugin: Using internal network plugin.
a800bcctest0136-bd:47807:47807 [2] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
a800bcctest0136-bd:47807:47807 [2] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
a800bcctest0136-bd:47807:47807 [2] NCCL INFO NET/Plugin: Using internal network plugin.
a800bcctest0136-bd:47806:47806 [1] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
a800bcctest0136-bd:47806:47806 [1] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
a800bcctest0136-bd:47806:47806 [1] NCCL INFO NET/Plugin: Using internal network plugin.
a800bcctest0136-bd:47808:47808 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
a800bcctest0136-bd:47808:47808 [3] NCCL INFO NET/Socket : Using [0]eth0:10.252.128.175<0> [1]kflax:11.43.252.176<0> [2]kflax-vxlan:fe80::34a6:77ff:fef3:8a98%kflax-vxlan<0>
a800bcctest0136-bd:47808:47808 [3] NCCL INFO Using non-device net plugin version 0
a800bcctest0136-bd:47808:47808 [3] NCCL INFO Using network Socket
a800bcctest0136-bd:47807:47807 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
a800bcctest0136-bd:47807:47807 [2] NCCL INFO NET/Socket : Using [0]eth0:10.252.128.175<0> [1]kflax:11.43.252.176<0> [2]kflax-vxlan:fe80::34a6:77ff:fef3:8a98%kflax-vxlan<0>
a800bcctest0136-bd:47807:47807 [2] NCCL INFO Using non-device net plugin version 0
a800bcctest0136-bd:47807:47807 [2] NCCL INFO Using network Socket
a800bcctest0136-bd:47298:47298 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
a800bcctest0136-bd:47298:47298 [0] NCCL INFO NET/Socket : Using [0]eth0:10.252.128.175<0> [1]kflax:11.43.252.176<0> [2]kflax-vxlan:fe80::34a6:77ff:fef3:8a98%kflax-vxlan<0>
a800bcctest0136-bd:47298:47298 [0] NCCL INFO Using non-device net plugin version 0
a800bcctest0136-bd:47298:47298 [0] NCCL INFO Using network Socket
a800bcctest0136-bd:47806:47806 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
a800bcctest0136-bd:47806:47806 [1] NCCL INFO NET/Socket : Using [0]eth0:10.252.128.175<0> [1]kflax:11.43.252.176<0> [2]kflax-vxlan:fe80::34a6:77ff:fef3:8a98%kflax-vxlan<0>
a800bcctest0136-bd:47806:47806 [1] NCCL INFO Using non-device net plugin version 0
a800bcctest0136-bd:47806:47806 [1] NCCL INFO Using network Socket
a800bcctest0136-bd:47298:47298 [0] NCCL INFO ncclCommInitRank comm 0x13593af0 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 61000 commId 0x45f3f6986cbbbd5 - Init START
a800bcctest0136-bd:47806:47806 [1] NCCL INFO ncclCommInitRank comm 0xe47a2f0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 62000 commId 0x45f3f6986cbbbd5 - Init START
a800bcctest0136-bd:47807:47807 [2] NCCL INFO ncclCommInitRank comm 0xf456a40 rank 2 nranks 4 cudaDev 2 nvmlDev 2 busId 6b000 commId 0x45f3f6986cbbbd5 - Init START
a800bcctest0136-bd:47808:47808 [3] NCCL INFO ncclCommInitRank comm 0xeeb36f0 rank 3 nranks 4 cudaDev 3 nvmlDev 3 busId 6c000 commId 0x45f3f6986cbbbd5 - Init START
a800bcctest0136-bd:47806:47806 [1] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
a800bcctest0136-bd:47808:47808 [3] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
a800bcctest0136-bd:47807:47807 [2] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
a800bcctest0136-bd:47298:47298 [0] NCCL INFO NCCL_CUMEM_ENABLE set by environment to 0.
/home/zhangzhicheng03/anaconda3/envs/videva/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 12 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!