fuzzywuzzy-rs
fuzzywuzzy-rs copied to clipboard
Add Opt-In Unicode Support
essentially all of the algorithms in this crate are poorly suited to unicode because they iterate over the chars in the string instead of the graheme clusters.
https://crates.io/crates/unicode-segmentation is the semi-official rust crate for unicode segmentation. I don't have a good option for detecting homoglyphs yet. But homoglyph detection / custom equality on the clusters in combination with all of the usual algorithms should be what we need for full support.
A quick search does not yield promising results for libraries providing unicode equivalence functions. The 'best' option might be https://crates.io/crates/unicode-normalization and then compare instead. unsure if segmentation is required at that point.
This would really be great! As of right now, this crate is kinda unusable for me, sadly.