lunr.js
lunr.js copied to clipboard
Find similar documents
Hi there,
I am using lunr for a private wiki and love the library. Is there a way to find similar documents, like discussed in this article?
https://stackoverflow.com/questions/7657673/how-to-find-similar-documents
Regards, Dietmar
Finding similar documents is not currently supported.
Almost everything that would be required for implementing this feature is currently supported though. At index time all documents are converted into term vectors, this are stored in the index. When querying the search query is also converted into a similar vector. The similarity between the query and a document is done by comparing similarity between these vectors.
So, since all the documents are already represented as vectors, being able to get a list of similar documents is just a matter of looking up the vector for the given document ID, then doing a similarity search with all the other documents.
There might be ways this can be optimised but that is the basics of how to implement it.
The code that does the similarity on vectors for querying is here.
Hmmmm we need to present an deep neural encoding of the search query and use it as a feature in the search - either the only feature or part of the query. This looks not too hard to implement @rflow?