Alex Herbert

Results 48 comments of Alex Herbert

Hi @ameyjadiye. Thanks for the contribution. I've had a look at this similarity as it is essentially a binary scoring metric: [Sørensen–Dice coefficient](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) Thus you must compute the matches (true-positives)...

Hi @ameyjadiye, The code is now a good working implementation. However it should be noted that the Sorensen-Dice similarity is a binary scoring metric. It can be applied to anything...

I've had more of a think and have mocked an implementation for a generic `IntersectionSimilarity`. See [TEXT-155](https://issues.apache.org/jira/projects/TEXT/issues/TEXT-155) and the code in the linked PR. This class can be used to...

I have just noticed that the `JaccardSimilarity` computes 0 for the similarity of zero length input sequences. This computes 1 so it should be changed.

I'll try and summarise where we are: 1. It is not clear what the result should be of 0 / 0 This is the `"" vs ""` case. Possibles: ```...

Overall this class now works nicely. But I think that my initial reservations have not been met. There is no reason that this should use bigrams, or allow duplicate bigrams....

I think the build fails because the maven-javadoc-plugin configuration section in build does not have this: ``` ${maven.compiler.source} ``` The plugin is configured in commons-parent. It has this tag for...

The pom.xml formatting was a mess. I have reformatted it in master to remove tabs and use 2 spaces for the indent. Can you rebase so only the changes you...

@garydgregory I disagree with the test being harder to comprehend. This actually seems like a useful addition as all assertions will run and you get a report on which failed....

As usual, there is more than one way to do things. A lot of these tests are repeat tests with different arguments thus more suitable for a ParameterizedTest with only...