stringdist icon indicating copy to clipboard operation
stringdist copied to clipboard

Jaccard of empty strings inconsistency on MacOS

Open pieterhartel opened this issue 9 months ago • 0 comments

The behaviour below is inconsistent on my Mac; on Ubuntu the results are mostly consistent. I cannot reproduce the inconsistency on Ubuntu, but on MacOS see below.

Here is the Jaccard similarity of two empty strings, first as arguments to the stringsim function, and then as components of a vector.

> x <- stringdist::stringsim("","",method="jaccard")
> str(x)
 num 1
> y <- stringdist::stringsim(c("y",""),c("y",""),method="jaccard")
> str(y)
 num [1:2] 1 NaN

Here is another example of inconsistent behaviour:

> stringdist::stringsim( c("foo","ac"), c("foo","bc"), method = "jaccard", q = 5)
[1] 1 1
> stringdist::stringsim( c("foo","ac"), c("foo","bc"), method = "jaccard", q = 3)
[1]   1 NaN
> stringdist::stringsim( c("foo","ac"), c("foo","bc"), method = "jaccard", q = 1)
[1] 1.0000000 0.3333333

I tried this with a fresh install of the stringdist package:

$ R
R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin20 (64-bit)
> packageVersion('stringdist')
[1] ‘0.9.10’

pieterhartel avatar Sep 06 '23 09:09 pieterhartel