haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

feat: support Trition with Nvidia embedders

Open tstadel opened this issue 5 months ago • 0 comments

Related Issues

  • support for Triton-hosted embedding models: e.g. almost any hugginface model can be hosted on Triton and thus make use of Triton's hosting and scaling features. This is especially helpful if you want to self-host arbitrary OpenSource models using NVIDIA's inference engines.

Proposed Changes:

  • add Triton Backend support for NvidiaEmbedders

How did you test it?

  • added tests
  • ran integration tests locally against Triton Embedding endpoint
  • sample embedding triton server can be hosted using
    docker run --gpus all -p 8000:8000 -p 8001:8001 -p 8002:8002 tstadelds/triton-embedding:intfloat-multilingual-e5-base-o4-v0.0.2
    

Notes for the reviewer

  • template for creating Triton images can be found here: https://github.com/deepset-ai/nvidia-triton-inference

Checklist

tstadel avatar Sep 23 '24 16:09 tstadel