StringCompare
StringCompare copied to clipboard
Address matching case study
Create a user example (see #2) which shows how StringCompare can be used to match business names.
That is, suppose we have a long list L of business names. Given another business name provided by a user, we want to be able to find the name in L which most closely matches it.
We can address this problem in a few steps:
- Identify an open dataset to work with for the case study.
- Identify a string comparison function which works best to match similar business names.
- Implement the brute force solution which computes all distances and returns the closest match.
- Try to speed up computation using indexing/blocking.
- Try to speed up computation using B-trees.
- Try to find quick approximate solutions using locality sensitive hashing.
Steps 1-3 are the most important. Steps 4-6 can be explored if they seem interesting.
References
- https://corpus.ulaval.ca/jspui/bitstream/20.500.11794/67747/1/36572.pdf