comparator icon indicating copy to clipboard operation
comparator copied to clipboard

Similarity and distance measures for clustering and record linkage applications in R

Results 3 comparator issues
Sort by recently updated
recently updated
newest added

@ngmarchant the Levenshtein distance can be implemented using only two rows for `dmat`, instead of using a square matrix. That could significantly reduce memory usage when comparing long sequences (400...

Consider adding support for token-based comparators. After mapping strings to token sets, the similarity of the sets can be measured using: * Cosine similarity * Sørensen–Dice coefficient * Jaccard index...

These measures are currently implemented in R. Porting to C++ is challenging, as it may be necessary to call an R function (the inner measure) from C++.