harry icon indicating copy to clipboard operation
harry copied to clipboard

Comparison of string pairs

Open rieck opened this issue 9 years ago • 2 comments

There exist analysis tasks where the similarity between pairs of strings needs to be computed. In this setting, computing a similarity matrix over all strings is clearly an overkill and it would be great if Harry could support this setting, e.g. using a special command-line option.

rieck avatar May 04 '15 08:05 rieck

Hello, it would be also good to output similarity score based on a threshold rather than all results.

gsever avatar Nov 15 '15 19:11 gsever

That's a very good idea. However, we would need to introduce a new representation and output format. Currently, Harry stores computed similarity values in a matrix. The benefit of a threshold would be that many of the matrix entries could be omitted and we would end with a sparse representation. I'll put this on my TODO list.

rieck avatar Nov 16 '15 10:11 rieck