pycountry
pycountry copied to clipboard
search_fuzzy should support typos and slight variations
v 19.8.18 'united state of america' is a miss (LookupError) which surprised me. It should return United States of America.
are you referring to the missing 's' in the search or is this about capitalization?
missing s
On Thu, 2 Jul 2020 at 20:46, Christian Theune [email protected] wrote:
are you referring to the missing 's' in the search or is this about capitalization?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flyingcircusio/pycountry/issues/34#issuecomment-652933423, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHCNYSIEZS7NVIVN4I34SDRZRQR7ANCNFSM4NPBCUNA .
--
Tim Richardson CPA, Director GrowthPath. Finance transformation for SMEs via Cloud ERP, advanced reporting, CRM
Mobile: +61 423 091 732 Office/Reception: +61 3 8678 1850. Book call: https://vyte.in/growthpath/15 Timezone is Melbourne AU. See this link for international time planning: https://www.timeanddate.com/worldclock/meeting.html?year=2020&month=5&day=16&p1=152
GrowthPath Pty Ltd ABN 18100392326 Xero Gold Partner. Dear Inventory, Zoho Analytics and Cin7 Implementation Partner. Custom integration specialists.
I was thinking to add scoring based on DiffLib in the standard library. But I haven't thought much about how this would fit with the existing 'fuzzy matches'. Do you consider the possible matches we currently get to be ranked? Because it is hard to score a proximity match based on DiffLib in a way that fits in with the current order of results. Your existing code has some heuristics for matching which make particularly sense for country names, yet my bug report is a bad miss. I think that scoring with DiffLib is genuine fuzzy logic, and that it should be a new method. It can be tweaked with heuristics. This would mean full backwards compatibility since the current matching wouldn't change; to use the new method means a new method.
What do you think?
On Thu, 2 Jul 2020 at 23:16, Tim Richardson [email protected] wrote:
missing s
On Thu, 2 Jul 2020 at 20:46, Christian Theune [email protected] wrote:
are you referring to the missing 's' in the search or is this about capitalization?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/flyingcircusio/pycountry/issues/34#issuecomment-652933423, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHCNYSIEZS7NVIVN4I34SDRZRQR7ANCNFSM4NPBCUNA .
--
Tim Richardson CPA, Director GrowthPath. Finance transformation for SMEs via Cloud ERP, advanced reporting, CRM
Mobile: +61 423 091 732 Office/Reception: +61 3 8678 1850. Book call: https://vyte.in/growthpath/15 Timezone is Melbourne AU. See this link for international time planning: https://www.timeanddate.com/worldclock/meeting.html?year=2020&month=5&day=16&p1=152
GrowthPath Pty Ltd ABN 18100392326 Xero Gold Partner. Dear Inventory, Zoho Analytics and Cin7 Implementation Partner. Custom integration specialists.
--
Tim Richardson CPA, Director GrowthPath. Finance transformation for SMEs via Cloud ERP, advanced reporting, CRM
Mobile: +61 423 091 732 Office/Reception: +61 3 8678 1850. Book call: https://vyte.in/growthpath/15 Timezone is Melbourne AU. See this link for international time planning: https://www.timeanddate.com/worldclock/meeting.html?year=2020&month=5&day=16&p1=152
GrowthPath Pty Ltd ABN 18100392326 Xero Gold Partner. Dear Inventory, Zoho Analytics and Cin7 Implementation Partner. Custom integration specialists.
Actually, Levensthein or a similar distance would be helpful but likely much much harder compute wise. We could take a look at https://stackoverflow.com/questions/20162894/alternative-to-levenshtein-and-trigram for example.
Or the 'any partial substring' (from Sublime Text for example) search might be useful. But that will only compensate for missing characters, not if there are too many or if the order is wrong.
It would be good to copy from https://github.com/life4/textdistance and https://github.com/jamesturk/jellyfish