[Feature]: Batch embedding and reranking

Open twright8 opened this issue 9 months ago • 1 comments

🚀 The feature, motivation and pitch

Would it be possible to add support for embedding generation and reranking in Aphrodite? Right now, using RAG setups often means juggling both vLLM and Aphrodite, but it’d be great if Aphrodite could handle the whole process.

Something like vLLM’s pooling models (https://docs.vllm.ai/en/v0.6.5/models/pooling_models.html) could work—letting us generate embeddings efficiently and improve retrieval without switching tools. This would help speed things up and simplify deployment.

Curious if this would be doable. Appreciate your thoughts!

Alternatives

No response

Additional context

No response

Mar 10 '25 11:03 twright8

Yes, this is definitely doable. I'll work on this as soon as I have the bandwidth.

Mar 10 '25 11:03 AlpinDale