stringdist
stringdist copied to clipboard
Jaccard of empty strings inconsistency on MacOS
The behaviour below is inconsistent on my Mac; on Ubuntu the results are mostly consistent. I cannot reproduce the inconsistency on Ubuntu, but on MacOS see below.
Here is the Jaccard similarity of two empty strings, first as arguments to the stringsim
function, and then as components of a vector.
> x <- stringdist::stringsim("","",method="jaccard")
> str(x)
num 1
> y <- stringdist::stringsim(c("y",""),c("y",""),method="jaccard")
> str(y)
num [1:2] 1 NaN
Here is another example of inconsistent behaviour:
> stringdist::stringsim( c("foo","ac"), c("foo","bc"), method = "jaccard", q = 5)
[1] 1 1
> stringdist::stringsim( c("foo","ac"), c("foo","bc"), method = "jaccard", q = 3)
[1] 1 NaN
> stringdist::stringsim( c("foo","ac"), c("foo","bc"), method = "jaccard", q = 1)
[1] 1.0000000 0.3333333
I tried this with a fresh install of the stringdist package:
$ R
R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin20 (64-bit)
> packageVersion('stringdist')
[1] ‘0.9.10’