stringdist icon indicating copy to clipboard operation
stringdist copied to clipboard

Running cosine gives wrong distances when length of first pattern greater than or equal to length of x

Open jordivandooren opened this issue 7 months ago • 1 comments

# first pattern has nchar < nchar of x: no problem
> stringdist::afind("ab", c("a", "b"), method = "running_cosine")$distance
     [,1] [,2]
[1,]    0    0
> stringdist::afind("ab", c("a", "c"), method = "running_cosine")$distance
     [,1] [,2]
[1,]    0    1
> stringdist::afind("ab", c("a", "ab"), method = "running_cosine")$distance
     [,1] [,2]
[1,]    0    0

# first pattern has nchar >= nchar of x: unexpected results
> stringdist::afind("ab", c("ab", "a"), method = "running_cosine")$distance
     [,1]       [,2]
[1,]    0 0.06066017
> stringdist::afind("ab", c("xx", "a"), method = "running_cosine")$distance
     [,1] [,2]
[1,]    1    1

# example where match is wrong (b should match b with distance 0)
> stringdist::afind("ab", c("xx", "b"), method = "running_cosine")
$location
     [,1] [,2]
[1,]    1    1

$distance
     [,1] [,2]
[1,]    1    1

$match
     [,1] [,2]
[1,] "ab" "a" 

The unexpected results occur on multiple versions/platforms (tested on Linux R 4.1, Windows R 4.2).

jordivandooren avatar Nov 22 '23 21:11 jordivandooren