[Question]: Performance of execution in hybrid search by putting `filter` in different places of a chain call?
Describe your problem
If I have a large database storing colbert tensors and I want match_tensor ranking in hybrid search rrf or weighted_sum (but not in final stage reranking). Three questions below:
-
My user case is that though the database is large, but candindates can be filtered to less than 10000 by keywords. I don't want colbert ranking the entire database (I know it will cost too much resources or time to search the entire database). Can colbert rankings join rrf stage after filtering? If it is possible to perform filter in the chain call first, the
match_tensormethod will cost less resources, makingmatch_tensorranking inrrfandweighted_sumpossible. -
I am not very familiar with infinity and documents are still in a early stage. That's why I have a such simple question above. Also, another similar quesiton is that does it matter to put filter in different places in a chain? e.g. comapre the following 2 examples, is there a difference? table_instance.output(...).match_dense(...).match_sparse(...).match_text(...).match_tensor(...)
.filter(...).fusion(method="rrf").to_pl() table_instance.output(...).filter(...).match_dense(...).match_sparse(...).match_text(...).match_tensor(...).fusion(method="rrf").to_pl() -
Is it possible to use
match_dense、match_sparseandmatch_textto get the top 1000 first, and then in the reranking stage, usematch_tensoralong withmatch_dense、match_sparseandmatch_textto get arrforweighted_sumamong the 1000 to get the topn?
Tensor based ranker and reranker are different things.
I assume that what you want is not the former which will rank the overall dataset. As a result, you can use match_tensor as a reranker, similar with rrf as well as weighted_sum, instead of ranker through:
table_instance.output(...).match_dense(...).match_sparse().match_text().filter(...).fusion("match_tensor", ...).to_pl()
Actually, using match_tensor as a reranker is more recommended, because ranker is pretty expensive, what's more, tensor based ranker even has worse recall than reranker. Except that we've got a better tensor based index in future which can achieve much less memory footprint as well as better recall, do not use tensor ranker at any time ~~
Thanks a lot:) I know that match_tensor is not recommended because of the high expense. I am still confused by the following 2 questions:
- I want to know if
filterexecuted beforematch_tenserreducing candindates to like within 1000 (suppose the database has 10k long chunks), will the memory footprint significantly go down? @yingfeng
So basically, you mean that even if .filter is placed before .match_tensor, the match_tensor will still go through the entire database under the following code?
table_instance.output(...).filter(...).match_dense(...).match_sparse(...).match_text(...).match_tensor(...).fusion(method="rrf").to_pl()
- You mentioned this method:
table_instance.output(...).match_dense(...).match_sparse().match_text().filter(...).fusion("match_tensor", ...).to_pl()I find that match_tensor rerank is sometimes not as accurate asrrforweighted_sum. I need a "majority vote" score with colbert rank and other ranks together (match_tensor rank as a part of it). Can I achieve that? (similar to the 3rd question in the main post)
Yes, if you use match_tensor outside of fusion, it means to use tensor index to perform the tensor recall. The tensor index is a pure memory solution and has a much larger memory consumption compared general vector index due to the structure of tensor. However, if you put match_tensor within fusion, it means to use a tensor based reranker, it does not rely on any tensor index, just a pure tensor scan over storage. Additionally, you could use a binary quantization on these data before inserting into Infinity, which can reduce the memory occupation to 1/32 of the original at a cost of a little bit lower recall.
Regarding to the second question: if the tensor reranker is not as accurate as rrf, maybe it's caused by some queries heavily depending on full text search. As a result, maybe you could choose: match_dense(...).fusion("match_tensor",...).match_fulltext(...).fusion("rrf",...)
Thanks:)
-
So if I am not wrong, filtering can not reduce the on-time tensor index query cost? @yingfeng
-
You also mentioned this query for my previous 2nd question:
match_dense(...).fusion("match_tensor",...)(1st part).match_fulltext(...).fusion("rrf",...)(2nd part) Will the 2nd part be executed on the result on the 1st part? What is the 2nd partrrfscore based on? match_dense + match_tensor + match_fulltext? or just match_dense + match_fulltext?
- Yes
- rrf does not have any relationship with the relevance scores of each recall, but weighted_sum has. The final result of weighted_sum is decided by match_fulltext and match_dense. You could make experiments on that.
- I have some experiments. The numbers are pretty close. The _similarity scores are much more volatile than the final score. Seems that
weighted_sumis based on scores from ranking normalization not based on the similarity score directly.
Yes, if you use
match_tensoroutside offusion, it means to use tensor index to perform the tensor recall. The tensor index is a pure memory solution and has a much larger memory consumption compared general vector index due to the structure oftensor. However, if you putmatch_tensorwithinfusion, it means to use a tensor based reranker, it does not rely on any tensor index, just a pure tensor scan over storage. Additionally, you could use a binary quantization on these data before inserting into Infinity, which can reduce the memory occupation to 1/32 of the original at a cost of a little bit lower recall.Regarding to the second question: if the tensor reranker is not as accurate as rrf, maybe it's caused by some queries heavily depending on full text search. As a result, maybe you could choose: match_dense(...).fusion("match_tensor",...).match_fulltext(...).fusion("rrf",...)
Hi, how to use match_tensor outside of fusion, I can't find it in doc
Yes, if you use
match_tensoroutside offusion, it means to use tensor index to perform the tensor recall. The tensor index is a pure memory solution and has a much larger memory consumption compared general vector index due to the structure oftensor. However, if you putmatch_tensorwithinfusion, it means to use a tensor based reranker, it does not rely on any tensor index, just a pure tensor scan over storage. Additionally, you could use a binary quantization on these data before inserting into Infinity, which can reduce the memory occupation to 1/32 of the original at a cost of a little bit lower recall. Regarding to the second question: if the tensor reranker is not as accurate as rrf, maybe it's caused by some queries heavily depending on full text search. As a result, maybe you could choose: match_dense(...).fusion("match_tensor",...).match_fulltext(...).fusion("rrf",...)Hi, how to use
match_tensoroutside offusion, I can't find it in doc
You can take a look on the example usage of tensor_searcy.py