vllm
vllm copied to clipboard
[Bug]: `undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE` when running `0.7.3.dev57+g2ae88905.precompiled` on A100
Your current environment
The output of `python collect_env.py`
INFO 02-10 17:07:03 __init__.py:190] Automatically detected platform cuda.
Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 24.04.1 LTS (x86_64)
GCC version: (Ubuntu 13.2.0-23ubuntu4) 13.2.0
Clang version: Could not collect
CMake version: version 3.31.1
Libc version: glibc-2.39
Python version: 3.12.3 (main, Nov 6 2024, 18:32:19) [GCC 13.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-52-generic-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: 12.6.85
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100 80GB PCIe
Nvidia driver version: 565.57.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.6.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.6.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 224
On-line CPU(s) list: 0-223
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
Model name: Intel(R) Xeon(R) Platinum 8480+
BIOS Model name: Intel(R) Xeon(R) Platinum 8480+ CPU @ 2.0GHz
BIOS CPU family: 179
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 56
Socket(s): 2
Stepping: 6
CPU(s) scaling MHz: 23%
CPU max MHz: 3800.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 5.3 MiB (112 instances)
L1i cache: 3.5 MiB (112 instances)
L2 cache: 224 MiB (112 instances)
L3 cache: 210 MiB (2 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-55,112-167
NUMA node1 CPU(s): 56-111,168-223
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cudnn-frontend==1.8.0
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-dali-cuda120==1.44.0
[pip3] nvidia-modelopt==0.21.0
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvimgcodec-cu12==0.3.0.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] nvidia-pyindex==1.0.9
[pip3] onnx==1.17.0
[pip3] optree==0.13.1
[pip3] pynvml==11.4.1
[pip3] pytorch-triton==3.0.0+72734f086
[pip3] pyzmq==26.2.0
[pip3] torch==2.5.1
[pip3] torch_tensorrt==2.6.0a0
[pip3] torchaudio==2.5.1
[pip3] torchprofile==0.0.4
[pip3] torchvision==0.20.1
[pip3] transformers==4.49.0.dev0
[pip3] triton==3.1.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.7.3.dev57+g2ae88905
vLLM Build Flags:
CUDA Archs: 7.0 7.5 8.0 8.6 9.0+PTX; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X 0-55,112-167 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NVIDIA_VISIBLE_DEVICES=all
CUBLAS_VERSION=12.6.4.1
NVIDIA_REQUIRE_CUDA=cuda>=9.0
CUDA_CACHE_DISABLE=1
TORCH_CUDA_ARCH_LIST=7.0 7.5 8.0 8.6 9.0+PTX
NCCL_VERSION=2.23.4
NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
NVIDIA_PRODUCT_NAME=PyTorch
CUDA_VERSION=12.6.3.004
PYTORCH_VERSION=2.6.0a0+df5bbc0
PYTORCH_BUILD_NUMBER=0
CUDNN_FRONTEND_VERSION=1.8.0
CUDNN_VERSION=9.6.0.74
PYTORCH_HOME=/opt/pytorch/pytorch
LD_LIBRARY_PATH=/usr/local/lib/python3.12/dist-packages/cv2/../../lib64:/usr/local/lib/python3.12/dist-packages/torch/lib:/usr/local/lib/python3.12/dist-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NVIDIA_BUILD_ID=126674149
CUDA_DRIVER_VERSION=560.35.05
PYTORCH_BUILD_VERSION=2.6.0a0+df5bbc0
CUDA_HOME=/usr/local/cuda
CUDA_HOME=/usr/local/cuda
CUDA_MODULE_LOADING=LAZY
NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS=
NVIDIA_PYTORCH_VERSION=24.12
TORCH_ALLOW_TF32_CUBLAS_OVERRIDE=1
NCCL_CUMEM_ENABLE=0
TORCHINDUCTOR_COMPILE_THREADS=1
🐛 Describe the bug
This a follow up on #12847.
sing the main branch on commit 2ae889052c6d0205ca677052ddb41db96a2a2620
, we are facing ImportError: /usr/local/lib/python3.12/dist-packages/flash_attn_2_cuda.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
. The details of the env/test is given below.
Adding @youkaichao since I am suspicious #12963 may cause this(?).
[!NOTE] This issue does NOT happen using 0.7.1 release. On the same machine, same container, changing the installation to
pip install vllm
(orpip install https://github.com/vllm-project/vllm/releases/download/v0.7.1/vllm-0.7.1-cp38-abi3-manylinux1_x86_64.whl
) works fine.
- Container:
nvcr.io/nvidia/pytorch:24.12-py3
- Setup:
git clone https://github.com/vllm-project/vllm.git
cd vllm
VLLM_USE_PRECOMPILED=1 pip install --editable .
python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 --model meta-llama/Llama-3.2-3B-Instruct --seed 42 -tp 1 --use-v2-block-manager --max_model_len 2048
INFO 02-10 17:06:23 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 17:06:24 api_server.py:840] vLLM API server version 0.7.3.dev57+g2ae88905
INFO 02-10 17:06:24 api_server.py:841] args: Namespace(host='0.0.0.0', port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='meta-llama/Llama-3.2-3B-Instruct', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=2048, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=42, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False)
INFO 02-10 17:06:24 api_server.py:206] Started engine process with PID 837
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 878/878 [00:00<00:00, 10.6MB/s]
INFO 02-10 17:06:27 __init__.py:190] Automatically detected platform cuda.
ERROR 02-10 17:06:29 registry.py:307] Error in inspecting model architecture 'LlamaForCausalLM'
ERROR 02-10 17:06:29 registry.py:307] Traceback (most recent call last):
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/models/registry.py", line 508, in _run_in_subprocess
ERROR 02-10 17:06:29 registry.py:307] returned.check_returncode()
ERROR 02-10 17:06:29 registry.py:307] File "/usr/lib/python3.12/subprocess.py", line 502, in check_returncode
ERROR 02-10 17:06:29 registry.py:307] raise CalledProcessError(self.returncode, self.args, self.stdout,
ERROR 02-10 17:06:29 registry.py:307] subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1.
ERROR 02-10 17:06:29 registry.py:307]
ERROR 02-10 17:06:29 registry.py:307] The above exception was the direct cause of the following exception:
ERROR 02-10 17:06:29 registry.py:307]
ERROR 02-10 17:06:29 registry.py:307] Traceback (most recent call last):
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/models/registry.py", line 305, in _try_inspect_model_cls
ERROR 02-10 17:06:29 registry.py:307] return model.inspect_model_cls()
ERROR 02-10 17:06:29 registry.py:307] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/models/registry.py", line 276, in inspect_model_cls
ERROR 02-10 17:06:29 registry.py:307] return _run_in_subprocess(
ERROR 02-10 17:06:29 registry.py:307] ^^^^^^^^^^^^^^^^^^^
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/models/registry.py", line 511, in _run_in_subprocess
ERROR 02-10 17:06:29 registry.py:307] raise RuntimeError(f"Error raised in subprocess:\n"
ERROR 02-10 17:06:29 registry.py:307] RuntimeError: Error raised in subprocess:
ERROR 02-10 17:06:29 registry.py:307] <frozen runpy>:128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
ERROR 02-10 17:06:29 registry.py:307] Traceback (most recent call last):
ERROR 02-10 17:06:29 registry.py:307] File "/usr/local/lib/python3.12/dist-packages/transformers/utils/import_utils.py", line 1852, in _get_module
ERROR 02-10 17:06:29 registry.py:307] return importlib.import_module("." + module_name, self.__name__)
ERROR 02-10 17:06:29 registry.py:307] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-10 17:06:29 registry.py:307] File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
ERROR 02-10 17:06:29 registry.py:307] return _bootstrap._gcd_import(name[level:], package, level)
ERROR 02-10 17:06:29 registry.py:307] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap_external>", line 995, in exec_module
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
ERROR 02-10 17:06:29 registry.py:307] File "/usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py", line 49, in <module>
ERROR 02-10 17:06:29 registry.py:307] from .integrations.flash_attention import flash_attention_forward
ERROR 02-10 17:06:29 registry.py:307] File "/usr/local/lib/python3.12/dist-packages/transformers/integrations/flash_attention.py", line 5, in <module>
ERROR 02-10 17:06:29 registry.py:307] from ..modeling_flash_attention_utils import _flash_attention_forward
ERROR 02-10 17:06:29 registry.py:307] File "/usr/local/lib/python3.12/dist-packages/transformers/modeling_flash_attention_utils.py", line 30, in <module>
ERROR 02-10 17:06:29 registry.py:307] from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
ERROR 02-10 17:06:29 registry.py:307] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-10 17:06:29 registry.py:307] File "/usr/local/lib/python3.12/dist-packages/flash_attn/__init__.py", line 3, in <module>
ERROR 02-10 17:06:29 registry.py:307] from flash_attn.flash_attn_interface import (
ERROR 02-10 17:06:29 registry.py:307] File "/usr/local/lib/python3.12/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
ERROR 02-10 17:06:29 registry.py:307] import flash_attn_2_cuda as flash_attn_cuda
ERROR 02-10 17:06:29 registry.py:307] ImportError: /usr/local/lib/python3.12/dist-packages/flash_attn_2_cuda.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
ERROR 02-10 17:06:29 registry.py:307]
ERROR 02-10 17:06:29 registry.py:307] The above exception was the direct cause of the following exception:
ERROR 02-10 17:06:29 registry.py:307]
ERROR 02-10 17:06:29 registry.py:307] Traceback (most recent call last):
ERROR 02-10 17:06:29 registry.py:307] File "<frozen runpy>", line 198, in _run_module_as_main
ERROR 02-10 17:06:29 registry.py:307] File "<frozen runpy>", line 88, in _run_code
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/models/registry.py", line 532, in <module>
ERROR 02-10 17:06:29 registry.py:307] _run()
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/models/registry.py", line 525, in _run
ERROR 02-10 17:06:29 registry.py:307] result = fn()
ERROR 02-10 17:06:29 registry.py:307] ^^^^
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/models/registry.py", line 277, in <lambda>
ERROR 02-10 17:06:29 registry.py:307] lambda: _ModelInfo.from_model_cls(self.load_model_cls()))
ERROR 02-10 17:06:29 registry.py:307] ^^^^^^^^^^^^^^^^^^^^^
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/models/registry.py", line 280, in load_model_cls
ERROR 02-10 17:06:29 registry.py:307] mod = importlib.import_module(self.module_name)
ERROR 02-10 17:06:29 registry.py:307] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-10 17:06:29 registry.py:307] File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
ERROR 02-10 17:06:29 registry.py:307] return _bootstrap._gcd_import(name[level:], package, level)
ERROR 02-10 17:06:29 registry.py:307] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap_external>", line 995, in exec_module
ERROR 02-10 17:06:29 registry.py:307] File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/models/llama.py", line 46, in <module>
ERROR 02-10 17:06:29 registry.py:307] from vllm.model_executor.model_loader.weight_utils import (
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/model_loader/__init__.py", line 6, in <module>
ERROR 02-10 17:06:29 registry.py:307] from vllm.model_executor.model_loader.loader import (BaseModelLoader,
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/model_loader/loader.py", line 45, in <module>
ERROR 02-10 17:06:29 registry.py:307] from vllm.model_executor.model_loader.utils import (ParamMapping,
ERROR 02-10 17:06:29 registry.py:307] File "/tmp/vllm/vllm/model_executor/model_loader/utils.py", line 35, in <module>
ERROR 02-10 17:06:29 registry.py:307] module: Optional[transformers.PreTrainedModel] = None) -> bool:
ERROR 02-10 17:06:29 registry.py:307] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-10 17:06:29 registry.py:307] File "/usr/local/lib/python3.12/dist-packages/transformers/utils/import_utils.py", line 1840, in __getattr__
ERROR 02-10 17:06:29 registry.py:307] module = self._get_module(self._class_to_module[name])
ERROR 02-10 17:06:29 registry.py:307] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-10 17:06:29 registry.py:307] File "/usr/local/lib/python3.12/dist-packages/transformers/utils/import_utils.py", line 1854, in _get_module
ERROR 02-10 17:06:29 registry.py:307] raise RuntimeError(
ERROR 02-10 17:06:29 registry.py:307] RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
ERROR 02-10 17:06:29 registry.py:307] /usr/local/lib/python3.12/dist-packages/flash_attn_2_cuda.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
ERROR 02-10 17:06:29 registry.py:307]
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.