TensorRT-LLM
TensorRT-LLM copied to clipboard
[Model Requests] Add support for GLM-4 series
GLM-4 and GLM-4V are next-gen model of ChatGLM3 and CogVLM2, the model repository is here: https://github.com/THUDM/GLM-4/
GLM-4 model is very similar to ChatGLM3, only a slight modification is needed. https://github.com/THUDM/GLM-4/issues/132#issuecomment-2178031221
GLM-4V model is similar to CogVLM2(https://github.com/NVIDIA/TensorRT-LLM/issues/1644), just replace the language backbone to GLM-4 and remove the visual experts. It has better perfermance and even better accuracy,
Please add official support, I believe that TensorRT's blessing is a better choice for CUDA devices.
cc @ncomly-nvidia