fuzzywuzzy-rs icon indicating copy to clipboard operation
fuzzywuzzy-rs copied to clipboard

Add Opt-In Unicode Support

Open logannc opened this issue 5 years ago • 2 comments

essentially all of the algorithms in this crate are poorly suited to unicode because they iterate over the chars in the string instead of the graheme clusters.

https://crates.io/crates/unicode-segmentation is the semi-official rust crate for unicode segmentation. I don't have a good option for detecting homoglyphs yet. But homoglyph detection / custom equality on the clusters in combination with all of the usual algorithms should be what we need for full support.

logannc avatar Oct 13 '20 20:10 logannc

A quick search does not yield promising results for libraries providing unicode equivalence functions. The 'best' option might be https://crates.io/crates/unicode-normalization and then compare instead. unsure if segmentation is required at that point.

logannc avatar Oct 13 '20 20:10 logannc

This would really be great! As of right now, this crate is kinda unusable for me, sadly.

HalfVoxel avatar Jan 30 '22 22:01 HalfVoxel