TensorRT-LLM
TensorRT-LLM copied to clipboard
[Model Requests] Add support for GLM-4 series
GLM-4 and GLM-4V are next-gen model of ChatGLM3 and CogVLM2, the model repository is here: https://github.com/THUDM/GLM-4/
GLM-4 model is very similar to ChatGLM3, only a slight modification is needed. https://github.com/THUDM/GLM-4/issues/132#issuecomment-2178031221
GLM-4V model is similar to CogVLM2(https://github.com/NVIDIA/TensorRT-LLM/issues/1644), just replace the language backbone to GLM-4 and remove the visual experts. It has better perfermance and even better accuracy,
Please add official support, I believe that TensorRT's blessing is a better choice for CUDA devices.
cc @ncomly-nvidia
- @ncomly-nvidia @AdamzNV for vis
I will take a look at this. Thanks!
Thank you for your awesome work! Any updates on this issue? Your efforts are greatly appreciated.
Thank you for your awesome work! Any updates on this issue? Your efforts are greatly appreciated.
GLM4 will be supported in v0.12. Thanks!
Great work, thank you so much for the rapid support of TRT. Wondering when the v0.12 will be released? Thanks @syuoni
GLM4 is already available on Github main, see https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/chatglm
v0.12 is planned to be released by the end of this month.
Thanks so much, what about the GLM-4v-9b, any plan of supporting in this upcoming v0.12?
No. GLM-4v-9b will not be in v0.12. It may be supported in next versions.
Thank, looking forwad to it
Hi @rambleramble do u still have further issue or question now? If not, we'll close it soon.
No issue atm, thank you so much!