harry
harry copied to clipboard
Comparison of string pairs
There exist analysis tasks where the similarity between pairs of strings needs to be computed. In this setting, computing a similarity matrix over all strings is clearly an overkill and it would be great if Harry could support this setting, e.g. using a special command-line option.
Hello, it would be also good to output similarity score based on a threshold rather than all results.
That's a very good idea. However, we would need to introduce a new representation and output format. Currently, Harry stores computed similarity values in a matrix. The benefit of a threshold would be that many of the matrix entries could be omitted and we would end with a sparse representation. I'll put this on my TODO list.