juney-nvidia

Results 117 comments of juney-nvidia

+ @sunnyqgg for vis in case she may have more inputs on this question. June

@Sesameisgod I am not aware that TensorRT engine building process can use swap memory during the offline engine building process. An alternative way is that you can try to run...

@Sesameisgod to ensure you are aware of this Qwen2.5-VL effort from @yechank-nvidia https://github.com/NVIDIA/TensorRT-LLM/pull/3156/files Thanks June

@bebilli Hi bebill, We haven't finalized the plan to support Gemma 3 yet. And if you have interest, you are welcome to contribute this model support to TensorRT-LLM and we...

@bebilli Hi, I would recommend you use the PyTorch workflow to add Gemma 3 model support which can be less steep for AI application developers. You can follow this guide:...

> [@juney-nvidia](https://github.com/juney-nvidia) If this method you mentioned is used, is it necessary to convert to the native TensorRT format before inference? If conversion is not required, can the performance match...

> Thank you for your guidance. I'll go and give it a try. Thanks, looking forward to your contribution MR :) June

@shahizat Hi Shahizat, I think QWen 2.5 in general is supported in TRT-LLM: - https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/qwen#support-matrix While it is possible that there can be some small issues specifically to the model...

Close this since it is already solved. Thanks @lkm2835 for supporting the community :)

@byshiue @QiJune can you help review this EXAONE model MR? Thanks June