stringdist
stringdist copied to clipboard
Running cosine gives wrong distances when length of first pattern greater than or equal to length of x
# first pattern has nchar < nchar of x: no problem
> stringdist::afind("ab", c("a", "b"), method = "running_cosine")$distance
[,1] [,2]
[1,] 0 0
> stringdist::afind("ab", c("a", "c"), method = "running_cosine")$distance
[,1] [,2]
[1,] 0 1
> stringdist::afind("ab", c("a", "ab"), method = "running_cosine")$distance
[,1] [,2]
[1,] 0 0
# first pattern has nchar >= nchar of x: unexpected results
> stringdist::afind("ab", c("ab", "a"), method = "running_cosine")$distance
[,1] [,2]
[1,] 0 0.06066017
> stringdist::afind("ab", c("xx", "a"), method = "running_cosine")$distance
[,1] [,2]
[1,] 1 1
# example where match is wrong (b should match b with distance 0)
> stringdist::afind("ab", c("xx", "b"), method = "running_cosine")
$location
[,1] [,2]
[1,] 1 1
$distance
[,1] [,2]
[1,] 1 1
$match
[,1] [,2]
[1,] "ab" "a"
The unexpected results occur on multiple versions/platforms (tested on Linux R 4.1, Windows R 4.2).
Looks similar to https://github.com/markvanderloo/stringdist/issues/96