python convert_checkpoint.py --model_dir /workspace/lk/model/Qwen/14B/ --output_dir ./tllm_checkpoint_1gpu_fp16_wq --dtype float16 --use_weight_only --weight_only_precision int8
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024042300
0.10.0.dev2024042300
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 3.26it/s]
[04/30/2024-09:44:33] Some parameters are on the meta device device because they were offloaded to the cpu.
Traceback (most recent call last):
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 365, in
main()
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 357, in main
convert_and_save_hf(args)
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 319, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 325, in execute
f(args, rank)
File "/workspace/lk/model/tensorRT/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 305, in convert_and_save_rank
qwen = from_hugging_face(
File "/opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/models/qwen/convert.py", line 1087, in from_hugging_face
weights = load_weights_from_hf(config=config,
File "/opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/models/qwen/convert.py", line 1193, in load_weights_from_hf
weights = convert_hf_qwen(
File "/opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/models/qwen/convert.py", line 747, in convert_hf_qwen
get_tllm_linear_weight(qkv_w, tllm_prex + 'attention.qkv.',
File "/opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/models/qwen/convert.py", line 487, in get_tllm_linear_weight
v.cpu(), plugin_weight_only_quant_type)
NotImplementedError: Cannot copy out of meta tensor; no data!
related to https://github.com/NVIDIA/TensorRT-LLM/issues/1440
As mentioned by lkm2835, it is because you don't have enough cpu memory to put the model. You could try using the device with larger cpu memory.
Thank you. It's been solved
LIUKAI0815
How did you solve this problem?