GoMap icon indicating copy to clipboard operation
GoMap copied to clipboard

OpeningHours localization

Open HERTZNo1 opened this issue 3 years ago • 6 comments

In opening hours module (https://github.com/bryceco/GoMap/tree/master/src/iOS/OpeningHours), strings that depend on a user chosen language should be encapsulated in a separate file. Doing so would make localization of the module easier.

HERTZNo1 avatar May 06 '21 13:05 HERTZNo1

Are the strings below (and a bit of logic around 12-hour clocks) all there is to localizing the recognizer? Or are there other considerations? Would internationalizing the recognizer benefit from collecting training samples of opening hours posted in other languages?

https://github.com/bryceco/GoMap/blob/88af91c5567db3fd719be7e10bf309bbcfe27d47/src/iOS/OpeningHours/HoursRecognizer.swift#L573-L580 https://github.com/bryceco/GoMap/blob/88af91c5567db3fd719be7e10bf309bbcfe27d47/src/iOS/OpeningHours/HoursRecognizer.swift#L608 https://github.com/bryceco/GoMap/blob/88af91c5567db3fd719be7e10bf309bbcfe27d47/src/iOS/OpeningHours/HoursRecognizer.swift#L843-L856

1ec5 avatar Oct 28 '21 04:10 1ec5

Yes that's all there is to it, but if your language is RTL or doesn't use decimal numbers then it will need a bunch more work.

Examples of hours for other languages are useful just to see minor differences in style, like French using 6h00 instead of 6:00, but there's no training in the ML sense.

bryceco avatar Oct 28 '21 05:10 bryceco

I suspect each language or country will have additional complications, similar to address and phone number formats. For starters, Vietnamese refers to the days of the week using numerals (other than Sunday). The first line of the sign below says “Thứ 2 đến thứ 6: chiều từ 17 giờ đến 19 giờ”, which means “Monday through Saturday: afternoons from 17:00 to 19:00”. It’s also very common for opening hours to specify morning and afternoon hours separated by a lunch break.

The days can also be abbreviated like “T3” for Tuesday, and “h” is also used sometimes.

1ec5 avatar Oct 28 '21 06:10 1ec5

By the way, CLDR has a lot of data on date and time formats, some of which might be accessible through DateFormatter in Foundation. The big caveat is that CLDR’s formats are intended for output, whereas valid inputs could vary considerably.

1ec5 avatar Oct 28 '21 06:10 1ec5

Oh, there's also some references to AM/PM that aren't localized at all. If you know of languages that use them let me know!

bryceco avatar Oct 28 '21 07:10 bryceco

As long as the text around the numbers is fairly consistent (to disambiguate days from hours) then it could probably still be worked into the framework. I'm sure you're correct about needing lots of locale-specific tweaking.

Because the text recognition API can return blocks of text in different parts of the image in arbitrary order a lot of the code is just dealing with tokenizing the text, then sorting those tokens top to bottom and left to right, so the actual syntax parsing has reasonable input.

bryceco avatar Oct 28 '21 07:10 bryceco

Closing this due to lack of demand. Until there's a large corpus of languages being supported I don't have any confidence that translating to a new language will "just work" without code tweaks as well, so refactoring the translations into a separate file seems premature.

bryceco avatar Dec 18 '22 01:12 bryceco