TensorRT-LLM
TensorRT-LLM copied to clipboard
Support internlm2
做了相应修改后engine build不成功,请问可以怎样解决呢?
我的命令是:
python3 convert_checkpoint.py --model_dir /mnt/checkpoint/models/internlm2-chat-20b/ --dtype float16 --output_dir /mnt/tllm_checkpoint/internlm2-chat-20b/tllm_checkpoint_1gpu_tp1
trtllm-build --checkpoint_dir /mnt/tllm_checkpoint/internlm2-chat-20b/tllm_checkpoint_1gpu_tp1/ --output_dir /mnt/trt_engines/internlm2-chat-20b/fp16/1-gpu/ --gemm_plugin float16 --max_batch_size=32 --max_input_len=2048 --max_output_len=2048
做了相应修改后engine build不成功,请问可以怎样解决呢? 我的命令是: python3 convert_checkpoint.py --model_dir /mnt/checkpoint/models/internlm2-chat-20b/ --dtype float16 --output_dir /mnt/tllm_checkpoint/internlm2-chat-20b/tllm_checkpoint_1gpu_tp1
trtllm-build --checkpoint_dir /mnt/tllm_checkpoint/internlm2-chat-20b/tllm_checkpoint_1gpu_tp1/ --output_dir /mnt/trt_engines/internlm2-chat-20b/fp16/1-gpu/ --gemm_plugin float16 --max_batch_size=32 --max_input_len=2048 --max_output_len=2048 版本信息:
@PaulX1029 Hi, have you rebuilt and reinstalled tensorrt-llm? You can find the installation location by pip3 show tensorrt_llm
and check tensorrt_llm/models/__init__.py
to see if MODEL_MAP
is as expected:
@RunningLeon 请问您是采用什么build的方式,我是从pip安装的trtllm,我想要跟您对齐build方式重新进行
@RunningLeon 请问您是采用什么build的方式,我是从pip安装的trtllm,我想要跟您对齐build方式重新进行
https://github.com/NVIDIA/TensorRT-LLM/pull/266#issuecomment-2022311767
@RunningLeon 很感谢你的工作!想问下internvl-1.5的internlm2-20b 网络跟普通internlm2-20b有什么区别吗,我用了PR里的转换脚本转出来之后都是乱码。
@RunningLeon hi,we use lora finetuned the internlm2 model. Now we can convert the base model to llama, but not lora part, we tried to change the code of InternLM/tools/convert2llame.py to transfer lora to llama style, but did not work. Is there any other tools could work for lora? tips: we can't export base model and lora to one single model, because we want to use https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#run-llama-with-several-lora-checkpoints this feature. Thus, if we can transfer internlm2 lora to llama style, then we can use examples/hf_lora_convert.py to build trt-llm
@nv-guomingz Hi, sorry to bother you, but when will this PR be merged? Do I need to fix the conflicts? THX
@nv-guomingz hi, the conflicts with main branch are resolved. Looking forward to your review comments. THX.
@RunningLeon 请问下为什么internlm2需要单独一个convert_checkpoint.py呢?而不是复用llama的convert_checkpoint.py,internlm是直接使用llama的convert_checkpoint.py
@RunningLeon 请问下为什么internlm2需要单独一个convert_checkpoint.py呢?而不是复用llama的convert_checkpoint.py,internlm是直接使用llama的convert_checkpoint.py
hi, internlm2 W_qkv是在一起的,其次一些参数命名是和llama没有对齐的。因而没法直接使用llama的convert_checkpoint.py
@RunningLeon 请问下为什么internlm2需要单独一个convert_checkpoint.py呢?而不是复用llama的convert_checkpoint.py,internlm是直接使用llama的convert_checkpoint.py
hi, internlm2 W_qkv是在一起的,其次一些参数命名是和llama没有对齐的。因而没法直接使用llama的convert_checkpoint.py
Thank you for this explanation!
Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch this week.
Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch this week.
@nv-guomingz Done. Hope merging with main is OK.
Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch this week.
@nv-guomingz Done. Hope merging with main is OK.
Thanks @RunningLeon. Could u please rebase your commits into one single commit? That would be more easy for further integration.
Hi @RunningLeon I've managed to file the merge request in our internal repo and testing is on-going. If everything goes well, this MR would be upstreamed next week. Thanks for your contributing again.
@RunningLeon Internlm2 had been added into today's update. Please see notes here. https://github.com/NVIDIA/TensorRT-LLM/discussions/1726#discussion-6776859