appstream-glib
appstream-glib copied to clipboard
Don't ignore localized strings that are the same as original
Avoiding storing identical strings is a clever measure to keep the appstream file small and clean, but it ignores translation differences between languages and their specific locales.
Applications tend to "fallback" from missing locales by picking a translation of the same language but from a different locale. For instance, when missing a "pt_BR" translation, some apps will pick the "pt" translation instead. That usually works but there are some cases when it doesn't, such as for international words: Brazilian Portuguese (pt_BR) tends to use them, while European Portuguese (pt) has a translation for everything.
This way, "GNOME Boxes" gets translated to "Caixas GNOME" in European Portuguese (pt) but the same "GNOME Boxes" name is expected in Brazilian Portuguese (pt_BR).
This was initially reported as a Flatpak issue in https://lists.freedesktop.org/archives/flatpak/2019-May/001578.html Because OP was seeing the wrong translations in Flathub and GNOME Software.
How much does this affect the size of something like the Fedora metadata? Certainly en_GB
is 99.9% the same as C
although I do concede that gzip should dedupe this effectively. @kalev what do you think?
I'd say it's probably best to go for correctness, even if it increases the metadata size. Let's see how much bigger it gets and back this change out if it's unacceptably large?
How about keeping localized strings equal original only when the fallback language ('pt', in Felipe's example) translation is presented? If absent, there will be no fallback and no problem for other languages (e.g. pt_BR). This way, lines wouldn't be added without being required and file size may not grow too much.
How about keeping localized strings equal original only when the fallback language ('pt', in Felipe's example) translation is presented? If absent, there will be no fallback and no problem for other languages (e.g. pt_BR). This way, lines wouldn't be added without being required and file size may not grow too much.
My understanding is that the fallback is done in the application's side. All consumers of appstream.xml will read from it (GNOME Software, Flathub, other-software-store-thingies) and they are the ones using this heuristic to fallback missing translations.
and they are the ones using this heuristic to fallback missing translations
Old versions of gnome-software use appstream-glib to get the "best" translation, so it should be enough to fix it here for those older versions. For newer versions of gnome-software we parse it directly with libxmlb and so any fix would need to be copied there. I'm going to do some benchmarks compiling the entire fedora repo today with the dedepe functionality and without. It does take some time...
I'm guessing this wasn't tested :) The XML file is the same size, even post decompression. I think this patch needs to drop AS_NODE_INSERT_FLAG_DEDUPE_LANG
-- I'm just running the generator again with that dropped to see what it does to the XML file.
Hey there, is there anything blocking this PR? Now and then Brazilians ask about this issue.
Hey there, is there anything blocking this PR?
Well, there's CI failure and the fact that the patch doesn't work, but other than that...
Hi @felipeborges, I tried to update your branch but I don't have rights, so I just commented it. Changes should be enough to pass test.