GoMap
GoMap copied to clipboard
OpeningHours localization
In opening hours module (https://github.com/bryceco/GoMap/tree/master/src/iOS/OpeningHours), strings that depend on a user chosen language should be encapsulated in a separate file. Doing so would make localization of the module easier.
Are the strings below (and a bit of logic around 12-hour clocks) all there is to localizing the recognizer? Or are there other considerations? Would internationalizing the recognizer benefit from collecting training samples of opening hours posted in other languages?
https://github.com/bryceco/GoMap/blob/88af91c5567db3fd719be7e10bf309bbcfe27d47/src/iOS/OpeningHours/HoursRecognizer.swift#L573-L580 https://github.com/bryceco/GoMap/blob/88af91c5567db3fd719be7e10bf309bbcfe27d47/src/iOS/OpeningHours/HoursRecognizer.swift#L608 https://github.com/bryceco/GoMap/blob/88af91c5567db3fd719be7e10bf309bbcfe27d47/src/iOS/OpeningHours/HoursRecognizer.swift#L843-L856
Yes that's all there is to it, but if your language is RTL or doesn't use decimal numbers then it will need a bunch more work.
Examples of hours for other languages are useful just to see minor differences in style, like French using 6h00 instead of 6:00, but there's no training in the ML sense.
I suspect each language or country will have additional complications, similar to address and phone number formats. For starters, Vietnamese refers to the days of the week using numerals (other than Sunday). The first line of the sign below says “Thứ 2 đến thứ 6: chiều từ 17 giờ đến 19 giờ”, which means “Monday through Saturday: afternoons from 17:00 to 19:00”. It’s also very common for opening hours to specify morning and afternoon hours separated by a lunch break.
The days can also be abbreviated like “T3” for Tuesday, and “h” is also used sometimes.
By the way, CLDR has a lot of data on date and time formats, some of which might be accessible through DateFormatter in Foundation. The big caveat is that CLDR’s formats are intended for output, whereas valid inputs could vary considerably.
Oh, there's also some references to AM/PM that aren't localized at all. If you know of languages that use them let me know!
As long as the text around the numbers is fairly consistent (to disambiguate days from hours) then it could probably still be worked into the framework. I'm sure you're correct about needing lots of locale-specific tweaking.
Because the text recognition API can return blocks of text in different parts of the image in arbitrary order a lot of the code is just dealing with tokenizing the text, then sorting those tokens top to bottom and left to right, so the actual syntax parsing has reasonable input.
Closing this due to lack of demand. Until there's a large corpus of languages being supported I don't have any confidence that translating to a new language will "just work" without code tweaks as well, so refactoring the translations into a separate file seems premature.