Question about convert Qwen2-7B
when I run python convert_checkpoint.py --model_dir ./hug_ckpts/Qwen2-7B --output_dir ./qwen2-7b-trt --dtype float16, I got this error.
File "TensorRT-LLM-main/examples/qwen/convert_checkpoint.py", line 267, in convert_and_save_rank
mapping = Mapping(world_size=world_size,
TypeError: Mapping.__init__() got an unexpected keyword argument 'moe_tp_size'
@sky-fly97 Which trtllm version are you using? Could you please try the main branch?
I got the same error. nvcr.io/nvidia/tritonserver:24.03-py3 tensorrt-llm 0.10.0 Qwen2-1.5B-instruct
I got the same error. nvcr.io/nvidia/tritonserver:24.03-py3 tensorrt-llm 0.10.0 Qwen2-1.5B-instruct
Same here, I think it's no longer part of version 0.10.0 because I see here in the requirements file 0.12.0.dev2024071600 but even when I install all the requirements I get :
from tensorrt_llm.bindings.BuildInfo import ENABLE_MULTI_DEVICE
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs
For the issue
File "TensorRT-LLM-main/examples/qwen/convert_checkpoint.py", line 267, in convert_and_save_rank
mapping = Mapping(world_size=world_size,
TypeError: Mapping.__init__() got an unexpected keyword argument 'moe_tp_size'
it should be because you use different TRT-LLM version. (e.g., you use examples of trt-llm 0.12.0 dev branch, but the core (installed) trt-llm is 0.10.0)
For the second issue
from tensorrt_llm.bindings.BuildInfo import ENABLE_MULTI_DEVICE
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs
It is a common issue to install trt-llm by pypi in tritonserver docker image. Could you take a try to install the tensorrt_llm from source?
got the same error. nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3 tensorrt-llm 0.12.0 Qwen2-7B-instruct
@sky-fly97 @chengshan008 Could you try the above suggestions?