TensorRT-LLM Support internlm2

This PR supports the conversion of internlm2 from hf to trt-llm checkpoints with :

fp16/bf16
tp

Apr 02 '24 10:04 RunningLeon

做了相应修改后engine build不成功，请问可以怎样解决呢？我的命令是： python3 convert_checkpoint.py --model_dir /mnt/checkpoint/models/internlm2-chat-20b/ --dtype float16 --output_dir /mnt/tllm_checkpoint/internlm2-chat-20b/tllm_checkpoint_1gpu_tp1

trtllm-build --checkpoint_dir /mnt/tllm_checkpoint/internlm2-chat-20b/tllm_checkpoint_1gpu_tp1/ --output_dir /mnt/trt_engines/internlm2-chat-20b/fp16/1-gpu/ --gemm_plugin float16 --max_batch_size=32 --max_input_len=2048 --max_output_len=2048

Apr 09 '24 10:04 PaulX1029

做了相应修改后engine build不成功，请问可以怎样解决呢？我的命令是： python3 convert_checkpoint.py --model_dir /mnt/checkpoint/models/internlm2-chat-20b/ --dtype float16 --output_dir /mnt/tllm_checkpoint/internlm2-chat-20b/tllm_checkpoint_1gpu_tp1

trtllm-build --checkpoint_dir /mnt/tllm_checkpoint/internlm2-chat-20b/tllm_checkpoint_1gpu_tp1/ --output_dir /mnt/trt_engines/internlm2-chat-20b/fp16/1-gpu/ --gemm_plugin float16 --max_batch_size=32 --max_input_len=2048 --max_output_len=2048 版本信息：

Apr 09 '24 10:04 PaulX1029

@PaulX1029 Hi, have you rebuilt and reinstalled tensorrt-llm? You can find the installation location by pip3 show tensorrt_llmand check tensorrt_llm/models/__init__.py to see if MODEL_MAP is as expected:

Apr 10 '24 06:04 RunningLeon

@RunningLeon 请问您是采用什么build的方式，我是从pip安装的trtllm，我想要跟您对齐build方式重新进行

Apr 10 '24 08:04 PaulX1029

@RunningLeon 请问您是采用什么build的方式，我是从pip安装的trtllm，我想要跟您对齐build方式重新进行

https://github.com/NVIDIA/TensorRT-LLM/pull/266#issuecomment-2022311767

Apr 10 '24 09:04 RunningLeon

@RunningLeon 很感谢你的工作！想问下internvl-1.5的internlm2-20b 网络跟普通internlm2-20b有什么区别吗，我用了PR里的转换脚本转出来之后都是乱码。

Apr 24 '24 06:04 cqy930325

@RunningLeon hi，we use lora finetuned the internlm2 model. Now we can convert the base model to llama, but not lora part, we tried to change the code of InternLM/tools/convert2llame.py to transfer lora to llama style, but did not work. Is there any other tools could work for lora? tips: we can't export base model and lora to one single model, because we want to use https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#run-llama-with-several-lora-checkpoints this feature. Thus, if we can transfer internlm2 lora to llama style, then we can use examples/hf_lora_convert.py to build trt-llm

Apr 25 '24 09:04 ChengYouFancy

@nv-guomingz Hi, sorry to bother you, but when will this PR be merged? Do I need to fix the conflicts? THX

May 08 '24 02:05 RunningLeon

@nv-guomingz hi, the conflicts with main branch are resolved. Looking forward to your review comments. THX.

May 21 '24 07:05 RunningLeon

@RunningLeon 请问下为什么internlm2需要单独一个convert_checkpoint.py呢？而不是复用llama的convert_checkpoint.py，internlm是直接使用llama的convert_checkpoint.py

May 23 '24 10:05 DefTruth

@RunningLeon 请问下为什么internlm2需要单独一个convert_checkpoint.py呢？而不是复用llama的convert_checkpoint.py，internlm是直接使用llama的convert_checkpoint.py

hi, internlm2 W_qkv是在一起的，其次一些参数命名是和llama没有对齐的。因而没法直接使用llama的convert_checkpoint.py

May 23 '24 11:05 RunningLeon

@RunningLeon 请问下为什么internlm2需要单独一个convert_checkpoint.py呢？而不是复用llama的convert_checkpoint.py，internlm是直接使用llama的convert_checkpoint.py

hi, internlm2 W_qkv是在一起的，其次一些参数命名是和llama没有对齐的。因而没法直接使用llama的convert_checkpoint.py

Thank you for this explanation!

May 24 '24 03:05 DefTruth

Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch this week.

May 28 '24 06:05 nv-guomingz

Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch this week.

@nv-guomingz Done. Hope merging with main is OK.

May 28 '24 12:05 RunningLeon

Hi @RunningLeon sorry for late response due to internal task priority. Would u please rebase the code firstly and I'll try to merge your MR into main branch this week.

@nv-guomingz Done. Hope merging with main is OK.

Thanks @RunningLeon. Could u please rebase your commits into one single commit? That would be more easy for further integration.

May 29 '24 04:05 nv-guomingz

Hi @RunningLeon I've managed to file the merge request in our internal repo and testing is on-going. If everything goes well, this MR would be upstreamed next week. Thanks for your contributing again.

Jun 03 '24 08:06 nv-guomingz

@RunningLeon Internlm2 had been added into today's update. Please see notes here. https://github.com/NVIDIA/TensorRT-LLM/discussions/1726#discussion-6776859

Jun 04 '24 13:06 nv-guomingz

TensorRT-LLM TensorRT-LLM copied to clipboard

Support internlm2

TensorRT-LLM
TensorRT-LLM copied to clipboard