csvmatch
csvmatch copied to clipboard
🔎 Finds fuzzy matches between CSV files
There's a list here: http://ntz-develop.blogspot.co.uk/2011/03/fuzzy-string-search.html
One option to specify fields to use to create the blocks, another (?) to set the method -- default to exact match, options for Metaphone etc Interesting bit on Soundex...
For the purpose of matching strings with helpful numbers and unhelpful words (such as precinct names with codes and messy names), adding an "ignore_letters" option would be nice. This would...
eg: * generators and generator comprehensions * `cStringIO`, and other C-versions * streaming output? (probably required before #11) * write some performance tests
Could use something like Apache Arrow so files don't have to be kept in memory?
A la https://news.ycombinator.com/item?id=18613806
Using Nameparser? https://pypi.python.org/pypi/nameparser
A la how `csvlink` and `csvmatch` does by default.