archinstall icon indicating copy to clipboard operation
archinstall copied to clipboard

Rework system locale

Open codefiles opened this issue 3 years ago • 28 comments

PR Description:

  • Update list_locales() to use /usr/share/i18n/SUPPORTED.
  • Verify locales are supported by checking against /usr/share/i18n/SUPPORTED.
  • Uncomment entries in /etc/locale.gen or append if an entry is not found to uncomment.
  • Use localedef for generating locales instead of locale-gen.
  • Do not write to /etc/locale.gen when it does not need to be modified.
  • Do not generate locales when the given locales are already generated.
  • Do not set the system locale when it is already set to the given locale.
  • Strip encodings from options in the guided installer Locale language menu leaving only languages since encodings are selected in the Locale encoding menu.
  • Add support for generating multiple locales (to add to the guided installer in the future).

Tests and Checks

  • [ ] I have tested the code!

codefiles avatar Aug 14 '22 00:08 codefiles

This looks great! It's a breaking change as it will break all existing configurations (which is fine), so I'd like to propose that we postpone merging this (when you feel ready with the PR) until next release, that way we can add a warning to the current functionalities that they're being deprecated.

Torxed avatar Aug 14 '22 07:08 Torxed

In previous changes like this we did support backwards compatibility, for example superuser configuration. I think it shouldn't be too difficult to get that in here, essentially when parsing the config we need to check if there's an old version entry present and if so convert it to the new one.

Something like in archinstall/lib/models/users.py

However, i am very much in favor of providing a deprecated warning (also introduced in users) to inform that this will go away. This way we'll clean up this code and don't have to maintain it.

svartkanin avatar Aug 14 '22 08:08 svartkanin

However, i am very much in favor of providing a deprecated warning (also introduced in users) to inform that this will go away. This way we'll clean up this code and don't have to maintain it.

Yea this would be optimal, since we have about 1-2 months between releases any way I think having a deprecation warning is more than enough. We could do the same for the superuser configuration tbh.

Torxed avatar Aug 14 '22 10:08 Torxed

I've changed the scope of the pull request to focus on non-breaking changes. I will follow it up with a pull request for the unification of locale language and locale encoding into locale if that is a desired change.

codefiles avatar Aug 17 '22 00:08 codefiles

localedef was a new one for me. Looks really good. I'll try to run a few tests in a day or so and merge this in :)

Torxed avatar Aug 17 '22 10:08 Torxed

@Torxed, considering what you mentioned here: https://github.com/archlinux/archinstall/issues/1200#issuecomment-1130564952, is there a reason locale and hostname are not parameters to minimal_installation()?

-def minimal_installation(self, testing=False, multilib=False) -> bool:
+def minimal_installation(self, testing :bool = False, multilib :bool = False, hostname :str = 'archinstall', locale :Tuple[str, str] = ('en_US', 'UTF-8')) -> bool:

codefiles avatar Aug 18 '22 04:08 codefiles

@codefiles There's really no reason for it no.

Long answer: minimal_installation() didn't even set those before, it only did the bare minimum to get something up and running. Setting a hostname wasn't considered a requirement for that. Equally so, the locale would default to something so there was no need to set one.

As the project grew, those are sane defaults in almost any minimal installation so the function evolved. And since the values would get overridden by user choices later in guided.py we just set two sane default values in case the user skipped it. But making them into a parameter and sending it in and setting it there and then - makes a lot of sense.

Torxed avatar Aug 18 '22 14:08 Torxed

@Torxed Thanks for the insight. I'm just getting familiar with the project. I wanted to make sure I wasn't overlooking anything because it seems like an easy fix that would be a non-breaking change.

I created a locale class in locale_helpers.py. Here is what it looks like when used on an installed system:

>>> from archinstall.lib.locale_helpers import Locale
>>> Locale('en_US').set()
Setting locale: en_US.UTF-8
- Already generated
- Already set in /etc/locale.conf
- Already uncommented in /etc/locale.gen
Done
True
>>> Locale('en_US', 'ISO-8859-1').set()
Setting locale: en_US.ISO-8859-1
- Generating locale...
- Locale generated
- Updated /etc/locale.conf
- Updated /etc/locale.gen
Done
True
>>> Locale('sv_FI@euro').set()
Setting locale: sv_FI.UTF-8@euro
Entry 'sv_FI@euro UTF-8' not found in /etc/locale.gen
False
>>> Locale('sv_FI@euro', 'ISO-8859-15').set()
Setting locale: sv_FI.ISO-8859-15@euro
- Generating locale...
- Locale generated
- Updated /etc/locale.conf
- Updated /etc/locale.gen
Done
True
>>> Locale('en_US', 'UTF-8').set()
Setting locale: en_US.UTF-8
- Generating locale...
- Locale generated
- Updated /etc/locale.conf
- Updated /etc/locale.gen
Done
True

codefiles avatar Aug 20 '22 05:08 codefiles

Great modification, very clean and separates the functionality well from the installer code. I'll run a few tests in the afternoon (local time) and merge this later today if you feel ready with it :)

Torxed avatar Aug 22 '22 07:08 Torxed

screenshot

Saving one of the odd results from my testing that stood out. Not sure I entered a valid combination hehe. locale-gen accepted it, but it's not one of the valid pre-defined combinations from /etc/locale.conf.

Torxed avatar Aug 22 '22 10:08 Torxed

My thinking behind validating against /etc/locale.gen is that this file contains the locales that are supported by glibc. It gets created by the glibc PKGBUILD using the SUPPORTED file from the glibc repository.

It is possible this is a faulty idea and users would want locales that are not supported by glibc? The localization section of the installation guide on ArchWiki mentions the following:

Edit /etc/locale.gen and uncomment en_US.UTF-8 UTF-8 and other needed locales.

The generating locales section of the locales page of the ArchWiki:

Before a locale can be enabled on the system, it must be generated. This can be achieved by uncommenting applicable entries in /etc/locale.gen, and running locale-gen. Equivalently, commenting entries disables their respective locales.

Both are stating to uncomment entries rather than append entries and this appears to be intention when consider the /etc/locale.gen file is created from the localedata/SUPPORTED file from glibc. The action of uncommenting will leave one with only supported locales.

The locale you tested has the @euro modifier with UTF-8 encoding. All of the locales in /etc/locale.gen that have that modifier have the encoding ISO-8859-15. Here are two more locales that I'm not sure make sense but locale-gen generates them:

Generating locales...
  en_US.ISO-8859-15... done
  en_US.GB18030... done
Generation complete.

Also, I'm working on just a few more changes that I would like to commit to this pull request.

Edit: This note is on Gentoo Wiki on the Localization/Guide page under the Generating specific locales section and relates to the locale you were testing en_IE@euro UTF-8:

Note Use an @euro value from /usr/share/i18n/SUPPORTED as the locale when using the Euro currency symbol (€) on non UTF-8 based locales.

codefiles avatar Aug 23 '22 01:08 codefiles

Not the right place for this but these Arch Linux glibc package changes might be improvements:

  • Provide localedata/SUPPORTED as /usr/share/i18n/SUPPORTED. Though the contents of the glibc localedata/SUPPORTED file is used to create /etc/locale.gen, it would be beneficial if it was also supplied separately as /usr/share/i18n/SUPPORTED since the contents of /etc/locale.gen can be overwritten as it is intended to be edited. It appears this file is being distributed in this manner with the glibc packages in other distributions such as Debian and Gentoo.
  • Remove # en_US.UTF-8 UTF-8 line from examples in locale.gen.txt since that line, instead of the later #en_US.UTF-8 UTF-8, can become uncommented in /etc/locale.gen with the use of localectl set-locale en_US.UTF-8 or remove all examples since they are superfluous.

codefiles avatar Sep 05 '22 03:09 codefiles

Here is what this pull request looks like working with multiple locales:

>>> from archinstall.lib.locale_helpers import Locale, LocaleUtils
>>> locale_names = ['en_US.UTF-8', 'de_DE.UTF-8', 'de_DE.ISO-8859-1', 'de_DE.ISO-8859-15@euro']
>>> locales = []
>>> for locale_name in locale_names:
...     locales.append(Locale(locale_name))
...
>>> LocaleUtils(locales).run()
Uncommented entries in locale-gen configuration file
  de_DE.UTF-8 UTF-8
  de_DE ISO-8859-1
  de_DE@euro ISO-8859-15
  en_US.UTF-8 UTF-8
Generating locales...
  de_DE.UTF-8...
  de_DE.ISO-8859-1...
  de_DE.ISO-8859-15@euro...
  en_US.UTF-8...
Generation complete.
System locale set to en_US.UTF-8
True

This is ready to be reviewed and tested. More to come in future pull requests as mentioned here https://github.com/archlinux/archinstall/issues/1435#issuecomment-1236491516.

codefiles avatar Sep 05 '22 04:09 codefiles

Great work on this, I'll pass along the message about glibc and the examples file. I'll test this most likely next week, as this week is a bit busy and scheduled.

Torxed avatar Sep 05 '22 09:09 Torxed

I appreciate you passing along the message about the Arch Linux glibc package. Would you please link to the comment when passing the information along? I have just edited that comment for additional clarity.

codefiles avatar Sep 05 '22 12:09 codefiles

@codefiles As per suggested, your change is now merged upstream (https://github.com/archlinux/svntogit-packages/commit/3c537bee8d7921bc0af775fdb4d69124026532ac). Great work!

Torxed avatar Sep 06 '22 14:09 Torxed

Another message regarding the Arch Linux glibc package:

Note The patch file supported.diff implements everything below.

  • Debian [0][1] and Gentoo [2] process localedata/SUPPORTED to generate /usr/share/i18n/SUPPORTED. Use the following sed command on localedata/SUPPORTED to generate /usr/share/i18n/SUPPORTED.
sed -e '1,3d' -e 's|/| |g' -e 's| \\||g'
  • Modify the line with the sed command for /etc/locale.gen to remove trailing whitespace,
-  sed -e '1,3d' -e 's|/| |g' -e 's|\\| |g' -e 's|^|#|g' \
+  sed -e '1,3d' -e 's|/| |g' -e 's| \\||g' -e 's|^|#|g' \

or use the following sed command on the generated /usr/share/i18n/SUPPORTED rather than localedata/SUPPORTED.

sed 's|^|#|g'
  • Consider adding a note about /usr/share/i18n/SUPPORTED to locale.gen.txt.
-#  A list of supported locales is included in this file.
-#  Uncomment the ones you need.
+#  A list of supported locales is given in /usr/share/i18n/SUPPORTED
+#  and is included in this file. Uncomment the needed locales below.

[0] https://salsa.debian.org/glibc-team/glibc/-/blob/1ad03bfd35cb0258dc1e73c1aab4b9bc128a3ea2/debian/rules.d/build.mk#L240-244 [1] https://salsa.debian.org/glibc-team/glibc/-/blob/1ad03bfd35cb0258dc1e73c1aab4b9bc128a3ea2/debian/generate-supported.mk [2] https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-libs/glibc/glibc-9999.ebuild?id=14ae8ab7798898eb196ddecd9c58ff1f3e8ba306#n1401

codefiles avatar Sep 07 '22 01:09 codefiles

sed -e '1,3d' -e 's|/| |g' -e 's| \\||g'

Isn't this a bit of a workaround for an upstream glibc issue tho? Or did I misunderstood why we remove trailing whitespaces?

Torxed avatar Sep 07 '22 07:09 Torxed

sed -e '1,3d' -e 's|/| |g' -e 's| \\||g'

Isn't this a bit of a workaround for an upstream glibc issue tho? Or did I misunderstood why we remove trailing whitespaces?

You are misunderstanding why I'm requesting trailing whitespace to be removed. In the glibc PKGBUILD a sed command is being run on localedata/SUPPORTED to generate the /etc/locale.gen file. The command in the PKGBUILD is sed -e '1,3d' -e 's|/| |g' -e 's|\\| |g' -e 's|^|#|g' and the -e 's|\\| |g' portion replaces the backslash at the end of a line containing a locale such as aa_DJ.UTF-8/UTF-8 \ with a space. Changing the -e 's|\\| |g' portion to -e 's| \\||g' removes the space at the end before the backslash and does not add a space.

sed 's|\\| |g'

-aa_DJ.UTF-8/UTF-8 \
+aa_DJ.UTF-8/UTF-8  

sed 's| \\||g'

-aa_DJ.UTF-8/UTF-8 \
+aa_DJ.UTF-8/UTF-8

The following demonstrates that it makes a difference:

$ cat localedata/SUPPORTED
# This file names the currently supported and somewhat tested locales.
# If you have any additions please file a glibc bug report.
SUPPORTED-LOCALES=\
aa_DJ.UTF-8/UTF-8 \
aa_DJ/ISO-8859-1 \
aa_ER/UTF-8 \
aa_ER@saaho/UTF-8 \
...
$ sed -e '1,3d' -e 's|/| |g' -e 's|\\| |g' localedata/SUPPORTED
aa_DJ.UTF-8 UTF-8
aa_DJ ISO-8859-1
aa_ER UTF-8
aa_ER@saaho UTF-8
...
$ sed -e '1,3d' -e 's|/| |g' -e 's| \\||g' localedata/SUPPORTED
aa_DJ.UTF-8 UTF-8
aa_DJ ISO-8859-1
aa_ER UTF-8
aa_ER@saaho UTF-8
...
$ diff <(!-2) <(!!)
diff -qs <(sed -e '1,3d' -e 's|/| |g' -e 's|\\| |g' localedata/SUPPORTED) <(sed -e '1,3d' -e 's|/| |g' -e 's| \\||g' localedata/SUPPORTED)
Files /dev/fd/63 and /dev/fd/62 differ

The /usr/share/i18n/SUPPORTED file can be generated then sed 's|^|#|g' can be run on it to generate the /etc/locale.gen, just adding the processed SUPPORTED file as commented lines to the /etc/locale.gen file.

codefiles avatar Sep 07 '22 10:09 codefiles

This may also clear things up:

$ curl -sO "https://salsa.debian.org/glibc-team/glibc/-/raw/sid/debian/generate-supported.mk"
$ make -sf generate-supported.mk IN=localedata/SUPPORTED OUT=SUPPORTED.debian
$ sed \
	-e "/^#/d" \
	-e "/SUPPORTED-LOCALES=/d" \
	-e "s: \\\\::g" -e "s:/: :g" \
	localedata/SUPPORTED > SUPPORTED.gentoo
$ diff -qs SUPPORTED.debian SUPPORTED.gentoo
Files SUPPORTED.debian and SUPPORTED.gentoo are identical
$ diff -qs SUPPORTED.debian <(sed -e '1,3d' -e 's|/| |g' -e 's|\\| |g' localedata/SUPPORTED)
Files SUPPORTED.debian and /dev/fd/63 differ
$ diff -qs SUPPORTED.debian <(sed -e '1,3d' -e 's|/| |g' -e 's| \\||g' localedata/SUPPORTED)
Files SUPPORTED.debian and /dev/fd/63 are identical

codefiles avatar Sep 07 '22 11:09 codefiles

I see, I'll pass it along! :) Apologies for not getting it straight away, I should maybe have spent a few more minutes reading through your initial example. I was in a bit of a rush :)

Torxed avatar Sep 07 '22 12:09 Torxed

It is ok, should be a lot clearer now. It is hard to find a middle ground between being terse and verbose when describing issues. Terse descriptions might lack clarity and verbose descriptions risk being ignored.

codefiles avatar Sep 07 '22 13:09 codefiles

In regards to the Arch Linux glibc package, two of the bullets in my earlier comment, https://github.com/archlinux/archinstall/pull/1423#issuecomment-1238793882, have now been resolved by:

  • https://github.com/archlinux/svntogit-packages/commit/d6f2e1f80af0f94e6b024552ad9a94842002356d
  • https://github.com/archlinux/svntogit-packages/commit/b09829c6ca6b116f8d16db9f7de7fde72f47e758

@freswa, the revised patch file supported.diff targets the second bullet from my earlier comment, https://github.com/archlinux/archinstall/pull/1423#issuecomment-1238793882, that is not yet addressed. If implemented it would:

  • Eliminate the whitespace at the end of entries in /etc/locale.gen.
  • Deduplicate the sed commands that are working on the glibc file localedata/SUPPORTED to create /etc/locale.gen and /usr/share/i18n/SUPPORTED.
  • Reduce maintenance of the PKGBUILD should the glibc file localedata/SUPPORTED change in a way that required modification of the sed commands.

codefiles avatar Sep 22 '22 04:09 codefiles

The latest commit requires the next release of the glibc package for https://github.com/archlinux/svntogit-packages/commit/b09829c6ca6b116f8d16db9f7de7fde72f47e758.

codefiles avatar Sep 29 '22 03:09 codefiles

Perfect, I'll merge this as soon as https://archlinux.org/packages/core/x86_64/glibc/ is updated (the upstream commit is newer than latest package release)

Torxed avatar Sep 29 '22 08:09 Torxed

The glibc package has been updated.

Be aware that if this is merged the current ISO (2022.10.01) will not meet the required version of glibc when using master.

codefiles avatar Oct 06 '22 00:10 codefiles

Good heads up! Our ISO build and tests should still work as it always build the latest, but people tend to run the git version on the officiall ISO so, gut feeling says this will be the last PR into the next release, as close to the 1-11-2022 as possible, is that ok?

Torxed avatar Oct 06 '22 06:10 Torxed

Let's not merge this yet. I feel that it is stale now so I am changing this to a draft till I review it again.

codefiles avatar Sep 20 '23 12:09 codefiles