ClassicPress icon indicating copy to clipboard operation
ClassicPress copied to clipboard

Simplify localization process for multisitešŸ’”

Open elisabettac77 opened this issue 10 months ago ā€¢ 36 comments

Context

As Patrick found out, the peculiar way ClassicPress handles localization using one file for all strings (included the ones to localize a multisite network) poses a problem. WordPress uses multiple localization files to handle multisite correctly and there is code in various files handling which file should be uploaded based on conditional checks to verify if the site is a multisite and if it is front-end or admin and so on. Our peculiarity might result in multisite not being localized correctly. ClassicPress could hence benefit from a simplification like the one Patrick suggested here.

Possible implemantion

Quoting Patrick's suggestion:

function load_default_textdomain( $locale = null ) {
	if ( null === $locale ) {
		$locale = determine_locale();
	}

	// Unload previously loaded strings so we can switch translations.
	unload_textdomain( 'default' );

	// Reload strings if the locale has been switched.
	return load_textdomain( 'default', WP_LANG_DIR . "/$locale.mo", $locale );
}

Possible Solution

The solution should ensure that ClassicPress is able to correctly localize a multisite network even using one single file for all strings. The conditional check is no longer needed, and the files where it happens should be amended if this doesn't pose problems to other functions in ClassicPress

Will you be able to help with the implementation?

I am not directly able to contribute to core, but I am available to cooperate on behalf of the i18n team to the resolution of this issue by testing and troubleshooting.

elisabettac77 avatar Apr 15 '24 07:04 elisabettac77

This issue has been mentioned on ClassicPress Forums. There might be relevant details there:

https://forums.classicpress.net/t/help-translate-version-2-0/4845/37

ClassyBot avatar Apr 15 '24 07:04 ClassyBot

In addition to this PR. /wp-admin/includes/admin.php also attempts to load the admin-$locale.mo file.

Suggested code change would then be:

if ( ! defined( 'WP_ADMIN' ) ) {
   /*
    * This file is being included from a file other than wp-admin/admin.php, so
    * some setup was skipped. Make sure the admin message catalog is loaded since
    * load_default_textdomain() will not have done so in this context.
    */
   load_default_textdomain( get_locale() ); 
}

StonehengeCreations avatar Apr 15 '24 10:04 StonehengeCreations

I've been looking into this more and even WordPress 4.9 from which ClassicPress 1.0.0 was forked handles text domain files in chunks.

After building locally, I think the files could be recreated with the following commands:

wp i18n make-pot ./build ./en_US.pot --ignore-domain --skip-audit --exclude="wp-content,wp-admin" --package-name="ClassicPress 2.0.0"
wp i18n make-pot ./build ./admin-en_US.pot --ignore-domain --skip-audit --exclude="wp-activate.php,wp-comments-post.php,wp-cron.php,wp-links-opml.php,wp-load.php,wp-login.php,wp-mail.php,wp-signup.php,wp-trackback.php,wp-content,wp-includes,wp-admin/network.php,wp-admin/network,class-wp-ms-sites-list-table.php,class-wp-ms-themes-list-table.php,class-wp-ms-users-list-table.php" --package-name="ClassicPress 2.0.0"
wp i18n make-pot ./build ./admin-network-en_US.pot --ignore-domain --skip-audit --include="wp-admin/network.php,wp-admin/network,class-wp-ms-sites-list-table.php,class-wp-ms-themes-list-table.php,class-wp-ms-users-list-table.php" --package-name="ClassicPress 2.0.0"

I'd be very happy for some feedback from anyone with more knowledge and experience in handling language domains and translations.

mattyrob avatar Apr 15 '24 12:04 mattyrob

Wow, that seems quite cumbersome.

Personally, I was very glad to find out that CP was using only one file. One single file reduces the translators workload, because several (many?) strings will no longer be doubled. It will also improve consistency as comparing related or (almost) identical strings is much easier that way.

Btw, the topic is somewhat incorrect. This is not only for multisite installations. WP loads different mo files for either the frontend or the backend and an additional file for the Netword Admin pages.

StonehengeCreations avatar Apr 15 '24 13:04 StonehengeCreations

Yes a possible solution is recreating WP structure like you mention @mattyrob - another approach is simplifying the handling has @StonehengeCreations suggested (and keep strings in one file).

we need to consider the following to decide which approach is best:

  • once the first translation release is out, ideally Crowdin would automatically source new strings directly from the repo and it would also commit the new completed locales resulting from the localization of new strings automatically.
  • to maintain the WP structure (that would allow us not to change the core files) a step in the translation build process prior to including them in core is needed to divide the complete translation release file into the several CP is going to need
  • taking the approach Patrick suggests can help in cleaning up and making core code more simple

I think that adding that command as an automatic action that takes the released translations and prepares them to include in core adds a layer of complexity to the process, just to follow a WP standard, but I do recognize two things:

  • if we go with simplification of core code and one single file both CP v1 and v2 do need the change (double work)
  • if we go that way and realize later that we need to backport WP locales that we do not support at present we will need to adapt them to our structure prior to uploading them on Crowdin.

elisabettac77 avatar Apr 15 '24 13:04 elisabettac77

once the first translation release is out, ideally Crowdin would automatically source new strings directly from the repo and it would also commit the new completed locales resulting from the localization of new strings automatically.

Can Crowdin keep excluding the wp-content folder if this process is being automated?

if we go that way and realize later that we need to backport WP locales that we do not support at present we will need to adapt them to our structure prior to uploading them on Crowdin.

What do you mean by that? If a new locale would be needed, the same en_US.pot file would be used to create the new locale po file, right?

to maintain the WP structure (that would allow us not to change the core files)

Some core files have already been changed?

StonehengeCreations avatar Apr 15 '24 13:04 StonehengeCreations

Although the approach is complicated, if those commands above prove to be correct it can be automated.

I have been considering the server load overhead that might be avoided for front end users is admins strings are skipped - that is possibly a reason why loading was split originally especially as the project gets larger and larger.

mattyrob avatar Apr 15 '24 13:04 mattyrob

@StonehengeCreations to your questions:

  1. Crowdin can exclude whatever directory we set to exclude no problem
  2. about new locales, nowadays we support a small number of locales, to speed up work and not reinvent the wheel when a new locale is needed (say Vietnamese or whatever) we can see if WP supports it and in that case we can backport the "strings" they already translated for it into CP so that we can then align them with CP strings and translate/revise only what is needed.
  3. Yes, core files got changed. For example Vanilla JS is preferred where possible and where there is an ongoing cleanup work to change JQuery instances to Vanilla JS, all block code is taken out and functions polyfilled, a very huge a11y improvement has been taken care of by Tim and overall several other things that made CP faster, more reliable and stable than WP. All that taking into account not to introduce breaking changes in a disruptive way. That also resulted in updated strings (CP differs from WP in that regard)

@mattyrob I imagined they were splitting the files for that reason. The end users are going to want to load only the needed strings for the page they are visiting and it seems to make more sense. Reading the commands seems they are correct to me but I refer to Patrick about that because he surely is more expert than me in that regard.

elisabettac77 avatar Apr 15 '24 13:04 elisabettac77

Me thinks yes, for the admin bar when present? but it happens only if the check if the user is logged in gives a true response IMHO?

elisabettac77 avatar Apr 15 '24 14:04 elisabettac77

After building locally, I think the files could be recreated with the following commands:

wp i18n make-pot ./build ./en_US.pot --ignore-domain --skip-audit --exclude="wp-content,wp-admin" --package-name="ClassicPress 2.0.0"
wp i18n make-pot ./build ./admin-en_US.pot --ignore-domain --skip-audit --exclude="wp-activate.php,wp-comments-post.php,wp-cron.php,wp-links-opml.php,wp-load.php,wp-login.php,wp-mail.php,wp-signup.php,wp-trackback.php,wp-content,wp-includes,wp-admin/network.php,wp-admin/network,class-wp-ms-sites-list-table.php,class-wp-ms-themes-list-table.php,class-wp-ms-users-list-table.php" --package-name="ClassicPress 2.0.0"
wp i18n make-pot ./build ./admin-network-en_US.pot --ignore-domain --skip-audit --include="wp-admin/network.php,wp-admin/network,class-wp-ms-sites-list-table.php,class-wp-ms-themes-list-table.php,class-wp-ms-users-list-table.php" --package-name="ClassicPress 2.0.0"

I am sorry, but when I use the provided code, Super Admin and other Network strings are included in the admin-en_US.pot and the admin-network-en_US.pot itself is empty. The full path seems to be missing for class-wp-ms-*.php and there are actually more files related to multisite.

Full list of multisite core files:

  • /wp-admin/
    • ms-admin.php
    • ms-delete-site.php
    • ms-edit.php
    • ms-options.php
    • ms-sites.php
    • ms-themes.php
    • ms-upgrade-network.php
    • ms-users.php
    • my-sites.php
    • network.php
  • /wp-admin/includes/
    • class-wp-ms-sites-list-table.php
    • class-wp-ms-themes-list-table.php
    • class-wp-ms-users-list-table.php
    • ms-admin-filters.php
    • ms-deprecated.php
    • ms.php
  • /wp-admin/network/ (all files)
  • /wp-includes/
    • class-wp-network-query.php
    • class-wp-network.php
    • ms-blogs.php
    • ms-default-constants.php
    • ms-default-filters.php
    • ms-deprecated.php
    • ms-files.php
    • ms-functions.php
    • ms-load.php
    • ms-network.php
    • ms-settings.php
    • ms-site.php

Assuming you are running the following commands from within the CP root directory and placing the pot files in the WP_LANG_DIR:

wp i18n make-pot . wp-content/languages/en_US.pot --package-name="ClassicPress 2.0.0" --ignore-domain --skip-audit --exclude="/wp-content/,/wp-admin/,/wp-includes/class-wp-network-*.php,/wp-includes/ms-*.php"

// Setting the source to the /wp-admin/ folder reduces exclusions.
wp i18n make-pot wp-admin wp-content/languages/admin-en_US.pot --package-name="ClassicPress 2.0.0" --ignore-domain --skip-audit --exclude="ms-*.php,my-sites.php,network.php,/network/,/includes/class-wp-ms-*.php,/includes/ms.php,/includes/ms-*.php"

wp i18n make-pot . wp-content/languages/admin-network-en_US.pot --package-name="ClassicPress 2.0.0" --ignore-domain --skip-audit --include="/wp-admin/ms-*.php,/wp-admin/my-sites.php,/wp-admin/network.php,/wp-admin/includes/class-wp-ms-*.php,/wp-admin/includes/ms.php,/wp-admin/includes/ms-*.php,/wp-admin/network/,/wp-includes/class-wp-network*.php,/wp-includes/ms-*.php"

StonehengeCreations avatar Apr 15 '24 20:04 StonehengeCreations

@StonehengeCreations

I found that setting the source to wp-admin messed up the reporting of the string location in the POT file - did you see that also?

Fantastic work listing all of the MS files, I'll update my shell script accordingly. I think I am getting close to something that is Pull Request ready.

mattyrob avatar Apr 16 '24 08:04 mattyrob

I found that setting the source to wp-admin messed up the reporting of the string location in the POT file - did you see that also?

Great catch! I did not see that.

This should work: wp i18n make-pot . wp-content/languages/admin-en_US.pot --package-name="ClassicPress 2.0.0" --ignore-domain --skip-audit --exclude="*.php,/wp-content/,/wp-includes/,/wp-admin/ms-*.php,/wp-admin/my-sites.php,/wp-admin/network.php,/wp-admin/network/,/wp-admin/includes/class-wp-ms-*.php,/wp-admin/includes/ms.php,/wp-admin/includes/ms-*.php"

StonehengeCreations avatar Apr 16 '24 09:04 StonehengeCreations

@mattyrob, Hang on... Using my codes resulted in admin-en_US.pot only having 181 strings. I found that very odd. I rechecked the files and with the following commands the total comes to 7986 strings. If we were to use one pot file, the total would be 7487, which could indicatie that Ā±500 strings are doubled. Most likely within the admin and admin-network files, as the latter contains 522 strings.

wp i18n make-pot . ./en_US.pot --package-name="ClassicPress 2.0.0" --ignore-domain --skip-audit --exclude="/wp-admin/,/wp-content/,/wp-includes/class-wp-network*.php,/wp-includes/ms-*.php"
wp i18n make-pot . ./admin-en_US.pot --package-name="ClassicPress 2.0.0" --ignore-domain --skip-audit --exclude="wp-activate.php,wp-comments-post.php,wp-cron.php,wp-links-opml.php,wp-load.php,wp-login.php,wp-mail.php,wp-signup.php,wp-trackback.php,/wp-content/,/wp-includes/,/wp-admin/ms-*.php,/wp-admin/my-sites.php,/wp-admin/network.php,/wp-admin/network/,/wp-admin/includes/class-wp-ms-*.php,/wp-admin/includes/ms.php,/wp-admin/includes/ms-*.php"
wp i18n make-pot . ./admin-network-en_US.pot --package-name="ClassicPress 2.0.0" --ignore-domain --skip-audit --include="/wp-admin/ms-*.php,/wp-admin/my-sites.php,/wp-admin/network.php,/wp-admin/network/,/wp-admin/includes/class-wp-ms-*.php,/wp-admin/includes/ms.php,/wp-admin/includes/ms-*.php,/wp-includes/class-wp-network*.php,/wp-includes/ms-*.php"

I used the --debug option and got one error for the main en_US.pot file: Debug (make-pot): Could not parse file wp-includes/js/tw-sack.js: String contains invalid UTF-8 (83.942s)

StonehengeCreations avatar Apr 16 '24 10:04 StonehengeCreations

that is another issue, using one file means removing doubles. if we then spli it there will be missing strings. so Patrick is right. we need to divide before translating

elisabettac77 avatar Apr 16 '24 11:04 elisabettac77

@StonehengeCreations - I see what you mean, the "*.php" for root file exclusion is seemingly used recursively and excludes all PHP files. Leading and trailing slashes for directories are not needed, and I've updated my code with the above and also to remove POT files before creation as they seem to update only.

mattyrob avatar Apr 16 '24 12:04 mattyrob

@mattyrob, I know the leading and trailing slashes are not needed, but it does make it a bit more clear.

Side note: when I ran you first code for the en_US.pot file, it did not exclude wp-content and wp-admin, so that is why I decided to play it safe by using the slashes. Using --debug turned out to be very helpful.

@elisabettac77 - Good point. Perhaps first backup (download) the complete / un-split po and mo files first? Because this will be the first time the translations will be split, it might be a good idea to do that locally first.

If all else fails, I noticed that pre-translations within Poedit are about 95% consistent with the current WP translations.

StonehengeCreations avatar Apr 16 '24 13:04 StonehengeCreations

Please have a look at the branch on the crowdin github repository - it contains split POT files, a shell script with some automation and a readme.

If this looks good I can merge into the v2 branch and we can proceed with splitting the POT files.

mattyrob avatar Apr 16 '24 15:04 mattyrob

It looks very good to me. Well documented as well.

StonehengeCreations avatar Apr 16 '24 21:04 StonehengeCreations

@elisabettac77 - Since the split pot files are now available in the Crowing GitHub repository, should I start using those for the V2 nl_NL translations?

StonehengeCreations avatar Apr 17 '24 09:04 StonehengeCreations

@StonehengeCreations - I haven't yet created the split POT files or merged them to CrowdIn.

mattyrob avatar Apr 17 '24 13:04 mattyrob

Sorry for the delay in answerring, been a little busy.

I think when @mattyrob is ready to merge in Crowdin to obtain the split files we can start translating (and yes, machine pre-translation is very accurate because it is based on public TMs that means it follows also WP public translations)

Ideally our next steps are as follows:

  • download the un-split versions for both v1 and v2 just to be safe (or push them to a repo so that we have an history to retrieve if something goes wrong)
  • split the files for both v1 and v2 and re-upload them to Crowdin
  • connect Crowdin with the repo(s) that from that moment on are going to be the project source.
  • leave it set to pre-translate that makes it convenient and faster for our localizer to check strings.

elisabettac77 avatar Apr 17 '24 14:04 elisabettac77

Let me know when the downloads are done and safe, I'd be happier leaving that to experienced translators.

I'm all set to split the v2 POT files - I suppose I can do the same fr v1 but I had not planned for that.

mattyrob avatar Apr 17 '24 14:04 mattyrob

@StonehengeCreations - I haven't yet created the split POT files or merged them to CrowdIn.

The pot file is the template file (en_US.pot). Do you mean the po files?

StonehengeCreations avatar Apr 17 '24 14:04 StonehengeCreations

I downloaded the zip file of ALL locales.

One thing I forgot to mention, prior to re-uploading the split versions please destroy the repo i18n that Crowdin generates to push the translated locales in (since we have to regenerate it and it's something Crowdin does on its own as soon as we destroy it).

About v1 - its translations are not complete because we never got around to proceed due to COVID so it makes sense to have CP v1 and CP v2 project, but we can decide to have only v2 and if someone asks for v1 we can set it up later. It would make sense working on v2 mostly since v1 adopters might upgrade to v2 when they see we have also localized it. so ti could be a way to encourage adoption of v2

elisabettac77 avatar Apr 17 '24 14:04 elisabettac77

I think that can better focus on V2 right now. I do not think V1 had strings that are no longer in V2, so we could always extract V1 strings from V2, right?

StonehengeCreations avatar Apr 17 '24 15:04 StonehengeCreations

yes we could do that for v1, so no worries there. So that decision is settled also.

elisabettac77 avatar Apr 17 '24 15:04 elisabettac77

Changes are merged on the ClassicPress GitHub i18n CrowdIn repository and the Sync process are complete after a few configuration adjustments. I think the pranslation work for v2 can now commence.

mattyrob avatar Apr 17 '24 15:04 mattyrob

I will check it later this evening since I have to go out for a few. Will confirm if we are ready to start translating again as soon as I come back. Thanks @mattyrob for helping us set everything in place!

elisabettac77 avatar Apr 17 '24 15:04 elisabettac77

@mattyrob - Something goes wrong. If I now download the files from Crowdin, the only Dutch file is nl_NL.po and it only contains continents-cities strings. The same goes for every other locale.

Also, there seem to be strings missing in the admin-en_US.pot file. The uploaded version has only 3835 strings, but the file I created yesterday contains 4358 strings. Crowdin-files

StonehengeCreations avatar Apr 17 '24 16:04 StonehengeCreations

consider that the Crowdin needs to regenerate all locales with pre-translation using TMs so the source MUST be correct for it to generate the correct locales and the number of strings should obviously match in source and locales that are generated (they have to go in pair - one in source and one in locale) I have the zip file of all the locales downloaded before if we need it to correct the issues. (do not know if I can put it somewhere in GitHub?)

elisabettac77 avatar Apr 17 '24 16:04 elisabettac77