Support for text embedding models
With the popularity of RAG, it would be great if TensorRT-LLM supported text-embedding and re-ranking models from sentence-transformers.
+1 to this. Would be great if embedding models can be served on Triton servers.
Is there any update on this? I'm also interested on this capability
cc @ncomly-nvidia @AdamzNV @laikhtewari for vis
Is there any plan to have this feature in the roadmap?
Just following up on this too! Would really appreciate it if there was text embedding support!
@SupreethRao99 , @FernandoDorado , @neilbhutada , There were discussions around that, but the team ultimately decided to stay focused on text generation, rather than introducing additional complexity from supporting models with fundamentally different characteristics.
Issue has not received an update in over 14 days. Adding stale label.