appstream-glib icon indicating copy to clipboard operation
appstream-glib copied to clipboard

Don't ignore localized strings that are the same as original

Open felipeborges opened this issue 5 years ago • 9 comments

Avoiding storing identical strings is a clever measure to keep the appstream file small and clean, but it ignores translation differences between languages and their specific locales.

Applications tend to "fallback" from missing locales by picking a translation of the same language but from a different locale. For instance, when missing a "pt_BR" translation, some apps will pick the "pt" translation instead. That usually works but there are some cases when it doesn't, such as for international words: Brazilian Portuguese (pt_BR) tends to use them, while European Portuguese (pt) has a translation for everything.

This way, "GNOME Boxes" gets translated to "Caixas GNOME" in European Portuguese (pt) but the same "GNOME Boxes" name is expected in Brazilian Portuguese (pt_BR).

This was initially reported as a Flatpak issue in https://lists.freedesktop.org/archives/flatpak/2019-May/001578.html Because OP was seeing the wrong translations in Flathub and GNOME Software.

felipeborges avatar May 23 '19 12:05 felipeborges

How much does this affect the size of something like the Fedora metadata? Certainly en_GB is 99.9% the same as C although I do concede that gzip should dedupe this effectively. @kalev what do you think?

hughsie avatar May 23 '19 12:05 hughsie

I'd say it's probably best to go for correctness, even if it increases the metadata size. Let's see how much bigger it gets and back this change out if it's unacceptably large?

kalev avatar May 23 '19 13:05 kalev

How about keeping localized strings equal original only when the fallback language ('pt', in Felipe's example) translation is presented? If absent, there will be no fallback and no problem for other languages (e.g. pt_BR). This way, lines wouldn't be added without being required and file size may not grow too much.

rffontenelle avatar May 23 '19 13:05 rffontenelle

How about keeping localized strings equal original only when the fallback language ('pt', in Felipe's example) translation is presented? If absent, there will be no fallback and no problem for other languages (e.g. pt_BR). This way, lines wouldn't be added without being required and file size may not grow too much.

My understanding is that the fallback is done in the application's side. All consumers of appstream.xml will read from it (GNOME Software, Flathub, other-software-store-thingies) and they are the ones using this heuristic to fallback missing translations.

felipeborges avatar May 24 '19 08:05 felipeborges

and they are the ones using this heuristic to fallback missing translations

Old versions of gnome-software use appstream-glib to get the "best" translation, so it should be enough to fix it here for those older versions. For newer versions of gnome-software we parse it directly with libxmlb and so any fix would need to be copied there. I'm going to do some benchmarks compiling the entire fedora repo today with the dedepe functionality and without. It does take some time...

hughsie avatar May 24 '19 08:05 hughsie

I'm guessing this wasn't tested :) The XML file is the same size, even post decompression. I think this patch needs to drop AS_NODE_INSERT_FLAG_DEDUPE_LANG -- I'm just running the generator again with that dropped to see what it does to the XML file.

hughsie avatar May 24 '19 12:05 hughsie

Hey there, is there anything blocking this PR? Now and then Brazilians ask about this issue.

rffontenelle avatar Dec 11 '19 18:12 rffontenelle

Hey there, is there anything blocking this PR?

Well, there's CI failure and the fact that the patch doesn't work, but other than that...

hughsie avatar Dec 12 '19 08:12 hughsie

Hi @felipeborges, I tried to update your branch but I don't have rights, so I just commented it. Changes should be enough to pass test.

igaldino avatar Apr 13 '20 19:04 igaldino