infinity
infinity copied to clipboard
Support for colbert style late interaction models in rerank endpoint
Feature request
There have been discussions on having decent performance in using colbert style models as rerankers (e.g. https://www.answer.ai/posts/2024-09-16-rerankers.html), and it would be useful if the rerank endpoint can support these as well.
Motivation
It would give more options in which models can be used for reranking, and the colbert style models are likely to have lower latencies than comparably sized models than cross encoders. Since Infinity already support colbert embeddings, it might not be too much work to add support.
Your contribution
I am willing to attempt an implementation