VerticaPy
VerticaPy copied to clipboard
Jaro-Winkler distance
Hi,
In several project, we would use Jaro-Winkler distance :
This method is implemented in Jellyfish library, and we would find this interesting to add this method to Vertica and/or VerticaPy.
Because this method is expensive to execute on only one node, this calculation have to found all matches and transpositions between 2 strings.
We know Vertica already have levenshtein distance, but Jaro-Winkler give good results also, and furthermore its result is normalized between 0 and 1, which make easier comparison and interpretation.
Jaro-Winkler is used in several use cases, to compare 2 strings, for :
- Detect duplicates values (as mistyped names...)
- To replace strings by normalized strings (like compagny names...), which permit to made a join with external referentials as INSEE
- ....
Jaro Winkler is on its way. It should be soon available.