linktransformer
linktransformer copied to clipboard
Suggestion to implement range_search
Hi All, again - wonderful package and just terrific work.
One possible extension you might one day consider would be using FAISS's range_search
function, instead of search
(see https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes#range-search). This would allow for a "many-to-many" match in the more traditional sense, perhaps aligning the behaviour of the LT package to prior fuzzy matching packages.
The main drawback is that it is not GPU-friendly, but works pretty efficiently on CPUs in my experience.
FWIW, my use-case is to match the universe of job-postings to DnB establishments. I use the range_search along with your firm-name embeddings to to build a dataset with all pairwise matches above a pretty low similarity threshold (0.5). This then gives me a huge set of potential matches, and I use an expectation-maximisation algorithm after this which considers both similarity-scores as well as other structured covariates (but not necessarily exact matching criteria) like industry codes, location-distance, etc to resolve the best match from this candidate set.
One day I would be happy to help implementing this, if you feel it's something you would want to pursue.
Thanks again for all the great work, it's hugely appreciated by many!