SetSimilaritySearch icon indicating copy to clipboard operation
SetSimilaritySearch copied to clipboard

Fix removing tokens that appear in query file but not index file from query sets

Open innovate-invent opened this issue 2 years ago • 3 comments

I believe this is an effective fix, but I am not entirely sure what the consequences of using negative indices is.

Resolves #13

innovate-invent avatar Jul 13 '22 01:07 innovate-invent

Thanks for the pull request. I think maybe a more robust solution is to modify the similarity function to add set sizes as new arguments. So we can use a different size than the set of tokens into the function. e.g., we can use the actual query set size rather than the size of the subset of tokens that exist in the index.

ekzhu avatar Jul 29 '22 22:07 ekzhu

I made the required changes. Can you help me verify if the changes are correct by adding a unit test for your scenario? Thanks!

ekzhu avatar Jul 29 '22 23:07 ekzhu

I ran the test on master and this branch, it fails on master and passes here.

innovate-invent avatar Aug 02 '22 19:08 innovate-invent