infinity [Question]: Performance of execution in hybrid search by putting `filter` in different places of a chain call?

Describe your problem

If I have a large database storing colbert tensors and I want match_tensor ranking in hybrid search rrf or weighted_sum (but not in final stage reranking). Three questions below:

My user case is that though the database is large, but candindates can be filtered to less than 10000 by keywords. I don't want colbert ranking the entire database (I know it will cost too much resources or time to search the entire database). Can colbert rankings join rrf stage after filtering? If it is possible to perform filter in the chain call first, the match_tensor method will cost less resources, making match_tensor ranking in rrf and weighted_sum possible.
I am not very familiar with infinity and documents are still in a early stage. That's why I have a such simple question above. Also, another similar quesiton is that does it matter to put filter in different places in a chain? e.g. comapre the following 2 examples, is there a difference? table_instance.output(...).match_dense(...).match_sparse(...).match_text(...).match_tensor(...).filter(...).fusion(method="rrf").to_pl() table_instance.output(...).filter(...).match_dense(...).match_sparse(...).match_text(...).match_tensor(...).fusion(method="rrf").to_pl()
Is it possible to use match_dense、match_sparse and match_text to get the top 1000 first, and then in the reranking stage, use match_tensor along with match_dense、match_sparse and match_text to get a rrf or weighted_sum among the 1000 to get the topn?

Feb 20 '25 10:02 firezym

Tensor based ranker and reranker are different things. I assume that what you want is not the former which will rank the overall dataset. As a result, you can use match_tensor as a reranker, similar with rrf as well as weighted_sum, instead of ranker through:

table_instance.output(...).match_dense(...).match_sparse().match_text().filter(...).fusion("match_tensor", ...).to_pl()

Actually, using match_tensor as a reranker is more recommended, because ranker is pretty expensive, what's more, tensor based ranker even has worse recall than reranker. Except that we've got a better tensor based index in future which can achieve much less memory footprint as well as better recall, do not use tensor ranker at any time ~~

Feb 20 '25 13:02 yingfeng

Thanks a lot:) I know that match_tensor is not recommended because of the high expense. I am still confused by the following 2 questions:

I want to know if filter executed before match_tenser reducing candindates to like within 1000 (suppose the database has 10k long chunks), will the memory footprint significantly go down? @yingfeng

So basically, you mean that even if .filter is placed before .match_tensor, the match_tensor will still go through the entire database under the following code?

table_instance.output(...).filter(...).match_dense(...).match_sparse(...).match_text(...).match_tensor(...).fusion(method="rrf").to_pl()

You mentioned this method: table_instance.output(...).match_dense(...).match_sparse().match_text().filter(...).fusion("match_tensor", ...).to_pl() I find that match_tensor rerank is sometimes not as accurate as rrf or weighted_sum. I need a "majority vote" score with colbert rank and other ranks together (match_tensor rank as a part of it). Can I achieve that? (similar to the 3rd question in the main post)

Feb 20 '25 14:02 firezym

Yes, if you use match_tensor outside of fusion, it means to use tensor index to perform the tensor recall. The tensor index is a pure memory solution and has a much larger memory consumption compared general vector index due to the structure of tensor. However, if you put match_tensor within fusion, it means to use a tensor based reranker, it does not rely on any tensor index, just a pure tensor scan over storage. Additionally, you could use a binary quantization on these data before inserting into Infinity, which can reduce the memory occupation to 1/32 of the original at a cost of a little bit lower recall.

Regarding to the second question: if the tensor reranker is not as accurate as rrf, maybe it's caused by some queries heavily depending on full text search. As a result, maybe you could choose: match_dense(...).fusion("match_tensor",...).match_fulltext(...).fusion("rrf",...)

Feb 20 '25 14:02 yingfeng

Thanks:)

So if I am not wrong, filtering can not reduce the on-time tensor index query cost? @yingfeng
You also mentioned this query for my previous 2nd question: match_dense(...).fusion("match_tensor",...)(1st part).match_fulltext(...).fusion("rrf",...)(2nd part) Will the 2nd part be executed on the result on the 1st part? What is the 2nd part rrf score based on? match_dense + match_tensor + match_fulltext? or just match_dense + match_fulltext?

Feb 20 '25 15:02 firezym

Yes
rrf does not have any relationship with the relevance scores of each recall, but weighted_sum has. The final result of weighted_sum is decided by match_fulltext and match_dense. You could make experiments on that.

Feb 20 '25 16:02 yingfeng

I have some experiments. The numbers are pretty close. The _similarity scores are much more volatile than the final score. Seems that weighted_sum is based on scores from ranking normalization not based on the similarity score directly.

Feb 20 '25 16:02 firezym

Yes, if you use match_tensor outside of fusion, it means to use tensor index to perform the tensor recall. The tensor index is a pure memory solution and has a much larger memory consumption compared general vector index due to the structure of tensor. However, if you put match_tensor within fusion, it means to use a tensor based reranker, it does not rely on any tensor index, just a pure tensor scan over storage. Additionally, you could use a binary quantization on these data before inserting into Infinity, which can reduce the memory occupation to 1/32 of the original at a cost of a little bit lower recall.

Regarding to the second question: if the tensor reranker is not as accurate as rrf, maybe it's caused by some queries heavily depending on full text search. As a result, maybe you could choose: match_dense(...).fusion("match_tensor",...).match_fulltext(...).fusion("rrf",...)

Hi, how to use match_tensor outside of fusion, I can't find it in doc

Mar 03 '25 10:03 anhld97

Yes, if you use match_tensor outside of fusion, it means to use tensor index to perform the tensor recall. The tensor index is a pure memory solution and has a much larger memory consumption compared general vector index due to the structure of tensor. However, if you put match_tensor within fusion, it means to use a tensor based reranker, it does not rely on any tensor index, just a pure tensor scan over storage. Additionally, you could use a binary quantization on these data before inserting into Infinity, which can reduce the memory occupation to 1/32 of the original at a cost of a little bit lower recall. Regarding to the second question: if the tensor reranker is not as accurate as rrf, maybe it's caused by some queries heavily depending on full text search. As a result, maybe you could choose: match_dense(...).fusion("match_tensor",...).match_fulltext(...).fusion("rrf",...)

Hi, how to use match_tensor outside of fusion, I can't find it in doc

You can take a look on the example usage of tensor_searcy.py

Mar 04 '25 04:03 yingfeng