pontoon icon indicating copy to clipboard operation
pontoon copied to clipboard

Polish: Non-breaking spaces converted to normal spaces when serializing Android files

Open flodolo opened this issue 6 months ago • 8 comments

https://github.com/mozilla-l10n/android-l10n/commit/4391c4c35544122fd21f378eca1320231be85005

I don't see anything in Pontoon showing a non-breaking space, but there is match in TM https://pontoon.mozilla.org/pl/firefox-for-android/all-resources/?search=mozac_feature_addons_permissions_privacy_description&search_identifiers=true&string=206783

If I look at the previous version of the file, it has non-breaking spaces https://github.com/mozilla-l10n/android-l10n/blob/75a7a5cfe8ee948aa788e673ab09b72a6a0c6caf/mozilla-mobile/android-components/components/feature/addons/src/main/res/values-pl/strings.xml

CC @eemeli @piotrdrag

flodolo avatar Jun 15 '25 18:06 flodolo

Worth noting that I couldn't spot anything similar happening for French, although the number of non-breaking spaces is a lot smaller.

flodolo avatar Jun 16 '25 07:06 flodolo

Feh, this is indeed due to the changes in #3611, via a sequence that I'd not accounted for. The intent with this code https://github.com/mozilla/pontoon/blob/135da2111108b4a4e12ee2fe6917315400ed2cbf/pontoon/sync/formats/init.py#L54-L58 is that thanks to the android_ascii_spaces we'd catch the abnormal spaces in the file and ensure that they're accounted for, even if they'd been getting collapsed into normal spaces when actually rendering.

So while that part of the code works just as it should, I didn't account for it in the migration, which has this: https://github.com/mozilla/pontoon/blob/135da2111108b4a4e12ee2fe6917315400ed2cbf/pontoon/base/migrations/0075_fix_android_ini_properties.py#L10 https://github.com/mozilla/pontoon/blob/135da2111108b4a4e12ee2fe6917315400ed2cbf/pontoon/base/migrations/0075_fix_android_ini_properties.py#L20 which maps each translation into the representation that'll be used for them when formatting, i.e. a normal space for a non-escaped non-breaking space.

This was not caught in the automated or manual tests, because in all of those, we end up doing a force-sync. With that, we do the following:

  1. Apply the migration, so non-breaking spaces become spaces in the database.
  2. Force-sync from repo to db, effectively reverting the previous step for non-ASCII spaces.
  3. Given that the repo and db now match, nothing changes in the repo.

However, in prod we didn't apply a force-sync to the whole project, and because nothing has changed in the repo, we skip reading the file before we write to it when any string in the file changes.

It might be a good idea to force-sync the android-l10n projects to fix the database for any translations that haven't had their spaces "fixed" yet in the repo.

Note that the original change here is not user-visible, because previously we were not escaping the non-breaking spaces, which meant that Android's XML parser was already collapsing them into normal spaces. This was a bug in our Android serialization that was fixed in #3611, at least for future translations.

eemeli avatar Jun 16 '25 10:06 eemeli

Sorry but I'm not following your line of thoughts here.

which maps each translation into the representation that'll be used for them when formatting, i.e. a normal space for a non-escaped non-breaking space.

Note that the original change here is not user-visible, because previously we were not escaping the non-breaking spaces, which meant that Android's XML parser was already collapsing them into normal spaces.

If I understand your explanation correctly, I feel like we fixed the problem the other way around.

If Pontoon had a non-breaking space in the translation, we should have serialized them so that Android will treat them as non-breaking space. Instead, we converted unescaped non-breaking spaces to normal spaces, effectively changing the translation?

flodolo avatar Jun 16 '25 10:06 flodolo

Instead, we converted unescaped non-breaking spaces to normal spaces, effectively changing the translation?

Sort of. We changed the translation in Pontoon, while not changing how the translation showed up in the product. As a result, we stopped misleading localisers about how their messages were being presented to users.

But yes, we should not have included the replacement of [^\S \n] with a space in the migration, which would have had the desired effect. That was a bug in the migration which was masked by all of the testing using force-sync, which also helped retain the special spaces.

eemeli avatar Jun 16 '25 10:06 eemeli

@eemeli Is the next step here to create another data migration, which will convert regular spaces back to non-breaking spaces based on the non-breaking space presence in the TranslationMemoryEntry table?

mathjazz avatar Jun 18 '25 08:06 mathjazz

That could work. This query seems to get pretty close, but still includes some false positives due to the other spacing changes:

select
  t.id,
  replace(t.string, E'\n', '¶') as ts,
  replace(m.target, E'\n', '¶') as ms
from base_translationmemoryentry m
join base_translation t on t.id = m.translation_id
join base_entity e on e.id = t.entity_id
join base_resource r on r.id = e.resource_id
where
  r.format = 'xml' and
  t.string != m.target and
  m.target similar to '%[^\S \n]%';

eemeli avatar Jun 18 '25 09:06 eemeli

Asking a possibly obvious question: if you input a translation with a non-breaking space in Android projects, it gets serialized and read back correctly with the current code (guessing escaped)?

flodolo avatar Jun 18 '25 10:06 flodolo

Asking a possibly obvious question: if you input a translation with a non-breaking space in Android projects, it gets serialized and read back correctly with the current code (guessing escaped)?

Yes.

eemeli avatar Jun 18 '25 10:06 eemeli