TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

[Model Requests] Add support for GLM-4 series

Open HLSS-Hen opened this issue 8 months ago • 9 comments

GLM-4 and GLM-4V are next-gen model of ChatGLM3 and CogVLM2, the model repository is here: https://github.com/THUDM/GLM-4/

GLM-4 model is very similar to ChatGLM3, only a slight modification is needed. https://github.com/THUDM/GLM-4/issues/132#issuecomment-2178031221

GLM-4V model is similar to CogVLM2(https://github.com/NVIDIA/TensorRT-LLM/issues/1644), just replace the language backbone to GLM-4 and remove the visual experts. It has better perfermance and even better accuracy,

Please add official support, I believe that TensorRT's blessing is a better choice for CUDA devices.

cc @ncomly-nvidia

HLSS-Hen avatar Jun 24 '24 12:06 HLSS-Hen