INFO 05-15 11:04:11 [model_runner.py:1110] Starting to load model ../Qwen2.5-7B-Instruct...
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:02, 1.37it/s]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:01<00:01, 1.41it/s]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:02<00:00, 1.36it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00, 1.35it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00, 1.36it/s]
INFO 05-15 11:04:14 [loader.py:447] Loading weights took 2.97 seconds
INFO 05-15 11:04:14 [model_runner.py:1146] Model loading took 14.2487 GB and 3.016548 seconds
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/work/WeClone-master/venv/bin/weclone-cli", line 8, in
[rank0]: sys.exit(cli())
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/click/core.py", line 1161, in call
[rank0]: return self.main(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/click/core.py", line 1082, in main
[rank0]: rv = self.invoke(ctx)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
[rank0]: return _process_result(sub_ctx.command.invoke(sub_ctx))
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
[rank0]: return ctx.invoke(self.callback, **ctx.params)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/click/core.py", line 788, in invoke
[rank0]: return __callback(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/weclone/cli.py", line 26, in wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/weclone/cli.py", line 47, in qa_generator
[rank0]: processor.main()
[rank0]: File "/home/work/WeClone-master/weclone/data/qa_generator.py", line 98, in main
[rank0]: self.clean_strategy.judge(qa_res)
[rank0]: File "/home/work/WeClone-master/weclone/data/clean/strategies.py", line 46, in judge
[rank0]: outputs = infer(
[rank0]: File "/home/work/WeClone-master/weclone/core/inference/vllm_infer.py", line 130, in infer
[rank0]: results = LLM(**engine_args).generate(inputs, sampling_params, lora_request=lora_request)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/utils.py", line 1037, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 243, in init
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 520, in from_engine_args
[rank0]: return engine_cls.from_vllm_config(
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 496, in from_vllm_config
[rank0]: return cls(
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 283, in init
[rank0]: self._initialize_kv_caches()
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 432, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 102, in determine_num_available_blocks
[rank0]: results = self.collective_rpc("determine_num_available_blocks")
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[rank0]: answer = run_method(self.driver_worker, method, args, kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/utils.py", line 2255, in run_method
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/worker/worker.py", line 229, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1243, in profile_run
[rank0]: self._dummy_run(max_num_batched_tokens, max_num_seqs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1354, in _dummy_run
[rank0]: self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1742, in execute_model
[rank0]: hidden_or_intermediate_states = model_executable(
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 462, in forward
[rank0]: hidden_states = self.model(input_ids, positions, intermediate_tensors,
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/compilation/decorators.py", line 172, in call
[rank0]: return self.forward(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 338, in forward
[rank0]: hidden_states, residual = layer(
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 243, in forward
[rank0]: hidden_states = self.self_attn(
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 174, in forward
[rank0]: qkv, _ = self.qkv_proj(hidden_states)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 474, in forward
[rank0]: output_parallel = self.quant_method.apply(self, input, bias)
[rank0]: File "/home/work/WeClone-master/venv/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 191, in apply
[rank0]: return F.linear(x, layer.weight, bias)
[rank0]: RuntimeError: CUDA error: no kernel image is available for execution on the device
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[rank0]: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
我想问下P40能跑吗?
cuda12.9
qwen2.5-7B
weclone-cli make-dataset 执行报错了