juney-nvidia
juney-nvidia
+ @sunnyqgg for vis in case she may have more inputs on this question. June
@Sesameisgod I am not aware that TensorRT engine building process can use swap memory during the offline engine building process. An alternative way is that you can try to run...
@Sesameisgod to ensure you are aware of this Qwen2.5-VL effort from @yechank-nvidia https://github.com/NVIDIA/TensorRT-LLM/pull/3156/files Thanks June
@bebilli Hi bebill, We haven't finalized the plan to support Gemma 3 yet. And if you have interest, you are welcome to contribute this model support to TensorRT-LLM and we...
@bebilli Hi, I would recommend you use the PyTorch workflow to add Gemma 3 model support which can be less steep for AI application developers. You can follow this guide:...
> [@juney-nvidia](https://github.com/juney-nvidia) If this method you mentioned is used, is it necessary to convert to the native TensorRT format before inference? If conversion is not required, can the performance match...
> Thank you for your guidance. I'll go and give it a try. Thanks, looking forward to your contribution MR :) June
@shahizat Hi Shahizat, I think QWen 2.5 in general is supported in TRT-LLM: - https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/qwen#support-matrix While it is possible that there can be some small issues specifically to the model...
Close this since it is already solved. Thanks @lkm2835 for supporting the community :)
@byshiue @QiJune can you help review this EXAONE model MR? Thanks June