stringdist icon indicating copy to clipboard operation
stringdist copied to clipboard

String distance functions for R

Results 20 stringdist issues
Sort by recently updated
recently updated
newest added

```R # first pattern has nchar < nchar of x: no problem > stringdist::afind("ab", c("a", "b"), method = "running_cosine")$distance [,1] [,2] [1,] 0 0 > stringdist::afind("ab", c("a", "c"), method =...

Running over vectors of 100k integers produces stack imbalance warnings at best and aborts the R session at worst: ```r # Two vectors of 100k random integers 1-12 d1

bug

The behaviour below is inconsistent on my Mac; on Ubuntu the results are mostly consistent. I cannot reproduce the inconsistency on Ubuntu, but on MacOS see below. Here is the...

Consider the NA correct result of: ``` stringdist( "", "XXX" , method = "cos" , q = 3) ``` However, if both a and b have nchar() < q, the...

I'm writing a function to fuzzy-match records in two datasets according to the following criteria, where a match will only be recorded as such if there is a positive match...

Let s_vec be some vector of N distinct strings. When N is too large, stringdistmatrix grows unwieldy (NxN), as does the "dist" struct returned by stringdistmatrix when called w a...

Now that the `C` functions have been exported, and CRAN is more strict about macro's like `INTEGER()` it would be good to have input type checking/conversion of `SEXP` objects at...

enhancement

Thanks for the efficiently implemented package! I'm wondering for the alignment-based methods, would it be possible to return the scoring matrix or some other data structure representing the optimal alignment(s)...

Suggested by Tom Magerman by e-mail to add - [Dice Coefficient](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) - [Sorensen overlap](https://en.wikipedia.org/wiki/Overlap_coefficient) to the q-gram distances

enhancement

For example ``` stringdist("hello","world",method="cosine", q=1:2) ``` would yield the cosine distance over the concatenation of 1-gram and 2-gram profiles. This would also enhance compatibility, e.g. with the `textcat` package.

enhancement