lunr.js icon indicating copy to clipboard operation
lunr.js copied to clipboard

Find similar documents

Open stupkad opened this issue 5 years ago • 3 comments

Hi there,

I am using lunr for a private wiki and love the library. Is there a way to find similar documents, like discussed in this article?

https://stackoverflow.com/questions/7657673/how-to-find-similar-documents

Regards, Dietmar

stupkad avatar Dec 11 '19 19:12 stupkad

Finding similar documents is not currently supported.

Almost everything that would be required for implementing this feature is currently supported though. At index time all documents are converted into term vectors, this are stored in the index. When querying the search query is also converted into a similar vector. The similarity between the query and a document is done by comparing similarity between these vectors.

So, since all the documents are already represented as vectors, being able to get a list of similar documents is just a matter of looking up the vector for the given document ID, then doing a similarity search with all the other documents.

There might be ways this can be optimised but that is the basics of how to implement it.

olivernn avatar Jan 21 '20 18:01 olivernn

The code that does the similarity on vectors for querying is here.

olivernn avatar Jan 21 '20 18:01 olivernn

Hmmmm we need to present an deep neural encoding of the search query and use it as a feature in the search - either the only feature or part of the query. This looks not too hard to implement @rflow?

rjurney avatar Jan 09 '22 17:01 rjurney