aphrodite-engine
aphrodite-engine copied to clipboard
[Feature]: Batch embedding and reranking
🚀 The feature, motivation and pitch
Would it be possible to add support for embedding generation and reranking in Aphrodite? Right now, using RAG setups often means juggling both vLLM and Aphrodite, but it’d be great if Aphrodite could handle the whole process.
Something like vLLM’s pooling models (https://docs.vllm.ai/en/v0.6.5/models/pooling_models.html) could work—letting us generate embeddings efficiently and improve retrieval without switching tools. This would help speed things up and simplify deployment.
Curious if this would be doable. Appreciate your thoughts!
Alternatives
No response
Additional context
No response
Yes, this is definitely doable. I'll work on this as soon as I have the bandwidth.