TensorRT-LLM Support for text embedding models

With the popularity of RAG, it would be great if TensorRT-LLM supported text-embedding and re-ranking models from sentence-transformers.

Mar 01 '24 16:03 SupreethRao99

+1 to this. Would be great if embedding models can be served on Triton servers.

Apr 16 '24 02:04 jasonngap1

Is there any update on this? I'm also interested on this capability

Nov 06 '24 16:11 FernandoDorado

cc @ncomly-nvidia @AdamzNV @laikhtewari for vis

Nov 14 '24 07:11 nv-guomingz

Is there any plan to have this feature in the roadmap?

Jan 13 '25 13:01 FernandoDorado

Just following up on this too! Would really appreciate it if there was text embedding support!

Oct 09 '25 11:10 neilbhutada

@SupreethRao99 , @FernandoDorado , @neilbhutada , There were discussions around that, but the team ultimately decided to stay focused on text generation, rather than introducing additional complexity from supporting models with fundamentally different characteristics.

Oct 21 '25 22:10 karljang

Issue has not received an update in over 14 days. Adding stale label.

Nov 05 '25 03:11 github-actions[bot]