Support for colbert style late interaction models in rerank endpoint

Open wwymak opened this issue 1 year ago • 0 comments

Feature request

There have been discussions on having decent performance in using colbert style models as rerankers (e.g. https://www.answer.ai/posts/2024-09-16-rerankers.html), and it would be useful if the rerank endpoint can support these as well.

Motivation

It would give more options in which models can be used for reranking, and the colbert style models are likely to have lower latencies than comparably sized models than cross encoders. Since Infinity already support colbert embeddings, it might not be too much work to add support.

Your contribution

I am willing to attempt an implementation

Dec 28 '24 20:12 wwymak