jellyfish
jellyfish copied to clipboard
Adding QWERTY support to DL distance
Adjusting cost in DL distance for QWERTY keypad mistakes, may be to others too. Please see if you're free.
key_pairs = [{'q','a'},{'q','w'},{'w','a'},{'w','e'},{'w','s'},{'e','s'},{'e','d'},{'e','r'},{'r','d'},{'r','f'},{'r','t'},{'t','g'},{'t','y'},{'y','g'},{'y','h'},{'y','u'},{'u','h'},{'u','j'},{'u','i'},{'i','j'},{'i','k'},{'i','o'},{'o','k'},{'o','l'},{'o','p'},{'l','k'},{'m','k'},{'m','n'},{'n','j'},{'n','b'},{'b','h'},{'b','v'},{'v','g'},{'v','c'},{'c','f'},{'c','x'},{'x','d'},{'x','z'},{'z','s'}]
def damerau_levenshtein_cost(a,b): if a==b : return 0 elif set([a,b]) in key_pairs: return .25 return 1
cost = damerau_levenshtein_cost(s1[i-1],s2[j-1])
I wish this hack finds some stack
This is really interesting. @DocShahrukh would this approach work for OCR errors too, assuming one came up with a useful weighting? So '1' paired with 'l', etc.
I think it’d make sense to make the cost function configurable, that’d let people do this for different layouts or ocr specific functions. I’d be glad to incorporate such a PR if anyone has time
On Thu, Dec 21, 2017 at 12:54 PM Jacob Fenton [email protected] wrote:
This is really interesting. @DocShahrukh https://github.com/docshahrukh would this approach work for OCR errors too, assuming one came up with a useful weighting? So '1' paired with 'l', etc.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jamesturk/jellyfish/issues/92#issuecomment-353415328, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAfYjwm1znugDdT4oG1TdkkS1zuz5Onks5tCptcgaJpZM4RJpuI .
Don't forget QWERTZ (Germany, Austria and Eastern Europe)!
And AZERTY, Dvorak, and other layouts. Cost function really needs to be configurable, perhaps with a few standard costs.
This is really interesting. @DocShahrukh would this approach work for OCR errors too, assuming one came up with a useful weighting? So '1' paired with 'l', etc.
Indeed, see for example: