perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

perl spews warning about locale when LC_ALL not set

Open diekhans opened this issue 3 years ago • 14 comments

Description

perl 5, version 34, subversion 1 (v5.34.1) built for x86_64-linux

% perl --version
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US.UTF-8",
        LC_ALL = (unset),
        LC_PAPER = "en_US.UTF-8@letter",
        LC_ADDRESS = "en_US.UTF-8",
        LC_MONETARY = "en_US.UTF-8",
        LC_NUMERIC = "C",
        LC_TELEPHONE = "en_US.UTF-8",
        LC_MESSAGES = "en_US.UTF-8",
        LC_IDENTIFICATION = "en_US.UTF-8",
        LC_COLLATE = "C",
        LC_MEASUREMENT = "en_US.UTF-8",
        LC_CTYPE = "C",
        LC_TIME = "en_US.UTF-8",
        LC_NAME = "en_US.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

This is perl 5, version 34, subversion 1 (v5.34.1) built for x86_64-linux

It is legitimate and necessary to have LC_ALL unset if one wishes to use UTF-8 for everything except sorting.

Steps to Reproduce

#/usr/bin/bash -e

unset LC_ALL
export LC_PAPER=en_US.UTF-8@letter
export LC_ADDRESS=en_US.UTF-8
export LC_MONETARY=en_US.UTF-8
export LC_NUMERIC=C
export LC_TELEPHONE=en_US.UTF-8
export LC_MESSAGES=en_US.UTF-8
export LC_IDENTIFICATION=en_US.UTF-8
export LC_COLLATE=C
export LANG=en_US.UTF-8
export LC_MEASUREMENT=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
export LC_CTYPE=C
export LC_TIME=en_US.UTF-8
export LC_NAME=en_US.UTF-8
echo "====== without LC_ALL ======"
perl --version

export LC_ALL=en_US.UTF-8
echo "====== with LC_ALL ======"
perl --version

Expected behavior No warning message

Perl configuration

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = "en_US.UTF-8",
	LC_ALL = (unset),
	LC_PAPER = "en_US.UTF-8@letter",
	LC_ADDRESS = "en_US.UTF-8",
	LC_MONETARY = "en_US.UTF-8",
	LC_NUMERIC = "C",
	LC_TELEPHONE = "en_US.UTF-8",
	LC_MESSAGES = "en_US.UTF-8",
	LC_IDENTIFICATION = "en_US.UTF-8",
	LC_COLLATE = "C",
	LC_MEASUREMENT = "en_US.UTF-8",
	LC_CTYPE = "C",
	LC_TIME = "en_US.UTF-8",
	LC_NAME = "en_US.UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Summary of my perl5 (revision 5 version 34 subversion 1) configuration:
   
  Platform:
    osname=linux
    osvers=3.10.0-1160.49.1.el7.x86_64
    archname=x86_64-linux
    uname='linux hgwdev 3.10.0-1160.49.1.el7.x86_64 #1 smp tue nov 30 15:51:32 utc 2021 x86_64 x86_64 x86_64 gnulinux '
    config_args='-ds -e -Dprefix=/cluster/home/markd/opt/centos7.0/x86_64 -Dldflags= -L/cluster/home/markd/opt/centos7.0/x86_64/lib -Wl,-rpath -Wl,/cluster/home/markd/opt/centos7.0/x86_64/lib'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='cc'
    ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
    optimize='-O2'
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='4.8.5 20150623 (Red Hat 4.8.5-44)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -L/cluster/home/markd/opt/centos7.0/x86_64/lib -Wl,-rpath -Wl,/cluster/home/markd/opt/centos7.0/x86_64/lib -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib /usr/lib64 /usr/local/lib64
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.17.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.17'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -L/cluster/home/markd/opt/centos7.0/x86_64/lib -L/usr/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
  Built under linux
  Compiled at May 18 2022 17:46:53
  @INC:
    /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/site_perl/5.34.1/x86_64-linux
    /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/site_perl/5.34.1
    /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/5.34.1/x86_64-linux
    /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/5.34.1
    /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/site_perl/5.30.3
    /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/site_perl/5.28.1
    /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/site_perl

diekhans avatar Jun 12 '22 16:06 diekhans

On 6/12/22 10:59, Mark Diekhans wrote:

Description

perl 5, version 34, subversion 1 (v5.34.1) built for x86_64-linux

|% perl --version perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "en_US.UTF-8", LC_ALL = (unset), LC_PAPER = @.***", LC_ADDRESS = "en_US.UTF-8", LC_MONETARY = "en_US.UTF-8", LC_NUMERIC = "C", LC_TELEPHONE = "en_US.UTF-8", LC_MESSAGES = "en_US.UTF-8", LC_IDENTIFICATION = "en_US.UTF-8", LC_COLLATE = "C", LC_MEASUREMENT = "en_US.UTF-8", LC_CTYPE = "C", LC_TIME = "en_US.UTF-8", LC_NAME = "en_US.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to a fallback locale ("en_US.UTF-8"). This is perl 5, version 34, subversion 1 (v5.34.1) built for x86_64-linux |

It is legitimate and necessary to have LC_ALL unset if one wishes to use UTF-8 for everything except sorting.

Before I look further into this, I'm trying to understand your statement just above.

Except for a few POSIX module functions, Perl effectively ignores the locale except within the scope of a 'use locale' statement. LC_ALL should be irrelevant.

I'm unsure why it is giving this message without investigation, but I'm trying to understand what you're trying to do.

Steps to Reproduce

|#/usr/bin/bash -e unset LC_ALL export @.*** export LC_ADDRESS=en_US.UTF-8 export LC_MONETARY=en_US.UTF-8 export LC_NUMERIC=C export LC_TELEPHONE=en_US.UTF-8 export LC_MESSAGES=en_US.UTF-8 export LC_IDENTIFICATION=en_US.UTF-8 export LC_COLLATE=C export LANG=en_US.UTF-8 export LC_MEASUREMENT=en_US.UTF-8 export LANGUAGE=en_US.UTF-8 export LC_CTYPE=C export LC_TIME=en_US.UTF-8 export LC_NAME=en_US.UTF-8 echo "====== without LC_ALL ======" perl --version export LC_ALL=en_US.UTF-8 echo "====== with LC_ALL ======" perl --version |

Expected behavior No warning message

Perl configuration

|perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "en_US.UTF-8", LC_ALL = (unset), LC_PAPER = @.***", LC_ADDRESS = "en_US.UTF-8", LC_MONETARY = "en_US.UTF-8", LC_NUMERIC = "C", LC_TELEPHONE = "en_US.UTF-8", LC_MESSAGES = "en_US.UTF-8", LC_IDENTIFICATION = "en_US.UTF-8", LC_COLLATE = "C", LC_MEASUREMENT = "en_US.UTF-8", LC_CTYPE = "C", LC_TIME = "en_US.UTF-8", LC_NAME = "en_US.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to a fallback locale ("en_US.UTF-8"). Summary of my perl5 (revision 5 version 34 subversion 1) configuration: Platform: osname=linux osvers=3.10.0-1160.49.1.el7.x86_64 archname=x86_64-linux uname='linux hgwdev 3.10.0-1160.49.1.el7.x86_64 #1 smp tue nov 30 15:51:32 utc 2021 x86_64 x86_64 x86_64 gnulinux ' config_args='-ds -e -Dprefix=/cluster/home/markd/opt/centos7.0/x86_64 -Dldflags= -L/cluster/home/markd/opt/centos7.0/x86_64/lib -Wl,-rpath -Wl,/cluster/home/markd/opt/centos7.0/x86_64/lib' hint=recommended useposix=true d_sigaction=define useithreads=undef usemultiplicity=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define Compiler: cc='cc' ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2' optimize='-O2' cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include' ccversion='' gccversion='4.8.5 20150623 (Red Hat 4.8.5-44)' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='cc' ldflags =' -L/cluster/home/markd/opt/centos7.0/x86_64/lib -Wl,-rpath -Wl,/cluster/home/markd/opt/centos7.0/x86_64/lib -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib /usr/lib64 /usr/local/lib64 libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=libc-2.17.so so=so useshrplib=false libperl=libperl.a gnulibc_version='2.17' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E' cccdlflags='-fPIC' lddlflags='-shared -O2 -L/cluster/home/markd/opt/centos7.0/x86_64/lib -L/usr/local/lib -fstack-protector-strong' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES PERLIO_LAYERS PERL_COPY_ON_WRITE PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP PERL_OP_PARENT PERL_PRESERVE_IVUV USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME USE_PERLIO USE_PERL_ATOF Built under linux Compiled at May 18 2022 17:46:53 @INC: /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/site_perl/5.34.1/x86_64-linux /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/site_perl/5.34.1 /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/5.34.1/x86_64-linux /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/5.34.1 /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/site_perl/5.30.3 /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/site_perl/5.28.1 /cluster/home/markd/opt/centos7.0/x86_64/lib/perl5/site_perl |

— Reply to this email directly, view it on GitHub https://github.com/Perl/perl5/issues/19855, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2DHZBLP2CYYXDP33OQ63VOYJPZANCNFSM5YSAIYQQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

khwilliamson avatar Jun 12 '22 19:06 khwilliamson

Hi Karl,

I am trying to run any Perl program without having it spew a large number of warnings the obscure the details of the what ever I am trying to do.

This happens when the perl interpert is invoked, even perl --version. Setting LC_ALL prevents this from happening, however this causes other problems and should not be required.

 LC_ALL       Will override the setting of all other LC_* variables.

If I set LC_ALL=en_US.UTF-8, the sort command will not sort by ASCII. If I set LC_ALL=C, UTF-8 display is broken

Mark

diekhans avatar Jun 12 '22 19:06 diekhans

On 6/12/22 13:54, Mark Diekhans wrote:

Hi Karl,

I am trying to run any Perl program without having it spew a large number of warnings the obscure the details of the what ever I am trying to do.

This happens when the perl interpert is invoked, even perl --version. Setting LC_ALL prevents this from happening, however this causes other problems and should not be required.

LC_ALL Will override the setting of all other LC_* variables.

If I set LC_ALL=en_US.UTF-8, the sort command will not sort by ASCII. If I set LC_ALL=C, UTF-8 display is broken

What happens if you don't set LC_ALL to anything. On my machine,

LC_COLLATE=C ./perl -Ilib -MPOSIX -le 'print setlocale(LC_ALL)'

prints the configuration you expect.

If that isn't happening, a workaround may be to set the environment variable BAD_LANG=0. On my machine, unsetting LC_ALL still works; I'm guessing you have a different underlying configuration which would need to be investigated

Mark

— Reply to this email directly, view it on GitHub https://github.com/Perl/perl5/issues/19855#issuecomment-1153277432, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2DH2FSHSKTHCNP7XSIELVOY6A3ANCNFSM5YSAIYQQ. You are receiving this because you commented.Message ID: @.***>

khwilliamson avatar Jun 12 '22 20:06 khwilliamson

LC_ALL is not set in my environment, which makes everything work except the Perl warning

% echo @${LC_ALL}@ @@

% LC_COLLATE=C perl -Ilib -MPOSIX -le 'print setlocale(LC_ALL)' perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "en_US.UTF-8", LC_ALL = (unset), LC_COLLATE = "C", LC_PAPER = @.***", LC_ADDRESS = "en_US.UTF-8", LC_MONETARY = "en_US.UTF-8", LC_NUMERIC = "C", LC_TELEPHONE = "en_US.UTF-8", LC_MESSAGES = "en_US.UTF-8", LC_IDENTIFICATION = "en_US.UTF-8", LC_MEASUREMENT = "en_US.UTF-8", LC_CTYPE = "C", LC_TIME = "en_US.UTF-8", LC_NAME = "en_US.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to a fallback locale ("en_US.UTF-8"). en_US.UTF-8

Setting LC_ALL fixes it:

% LC_ALL=C perl -Ilib -MPOSIX -le 'print setlocale(LC_ALL)' C

% LC_ALL=en_US.UTF-8 perl -Ilib -MPOSIX -le 'print setlocale(LC_ALL)' en_US.UTF-8

This is with my own compiled version of v5.34.1, however the Centos-default version of v5.16.3 shows the same behavior.

diekhans avatar Jun 12 '22 21:06 diekhans

Are you sure this is about LC_ALL that is causing it? What I noticed is that you are using export LC_PAPER=en_US.UTF-8@letter; can you try without setting "LC_PAPER"?

(I can sort of reproduce your issue by setting LC_PAPER but I'm not sure if that is the only issue/causing the issue you're observing.)

bram-perl avatar Jul 24 '22 12:07 bram-perl

My guess is that it is the @letter that is causing the problem It appears from code reading that perl doesn't handle the '@' properly.

khwilliamson avatar Jul 24 '22 15:07 khwilliamson

Yes, I just confirmed that @letter is the problem.

I can't find documentation about @letter on GNU, and I am not sure why this is in my locale. Perhaps some other UNIX.

If this is not a valid setting, it would be save time if Perl complained about which locale env var is bad, rather than this generic message.

Karl Williamson @.***> writes:

My guess is that it is the @letter that is causing the problem It appears from code reading that perl doesn't handle the '@' properly.

diekhans avatar Jul 24 '22 15:07 diekhans

I'm (currently) not convinced that perl is to blame..

Experimenting a bit more with it (on an older debian system):

    #include <stdio.h>
    #include <locale.h>

    int main(void)
    {
        char * foo;
        foo = setlocale(LC_PAPER, "en_US.UTF-8");
        printf("LC_PAPER = %s\n", foo);

        foo = setlocale(LC_PAPER, "en_US.UTF-8@LETTER");
        printf("LC_PAPER(2) = %s\n", foo);
    }

Running:

    $ ./a.out
    LC_PAPER = en_US.UTF-8
    LC_PAPER(2) = (null)

Checking the locale's that are installed:

    $ locale -a
    C
    C.UTF-8
    en_US.utf8
    POSIX

strace'ing ./a.out (relevant part only)

    $ strace ./a.out
    ...
    open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
    ...
    open("/usr/lib/locale/en_US.UTF-8@LETTER/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    open("/usr/lib/locale/en_US.utf8@LETTER/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    open("/usr/lib/locale/en_US@LETTER/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    open("/usr/lib/locale/en.UTF-8@LETTER/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    open("/usr/lib/locale/en.utf8@LETTER/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    open("/usr/lib/locale/en@LETTER/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    open("/usr/lib/locale/en_US.UTF-8/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    open("/usr/lib/locale/en_US.utf8/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    open("/usr/lib/locale/en_US/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    open("/usr/lib/locale/en.UTF-8/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    open("/usr/lib/locale/en.utf8/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    open("/usr/lib/locale/en/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

So it found and read the locale-archive but it still went on to look for LC_PAPER (which doesn't exist)... [I do not know if that is expected behavior]

Listing the locale-archive:

    $ localedef --list-archive  -v
    256420   195c0   1  8826550218ab82ebe83ec209894162e5 en_US.utf8/LC_CTYPE
        54  1879f0   1+ e184ed98f4e05dca7c06a4666b8f60cd en_US.utf8/LC_NUMERIC
      2454  187a30   1+ e184ed98f4e05dca7c06a4666b8f60cd en_US.utf8/LC_TIME
    1243766   57f70   1  5aee00a13cb3e717fd8cb6dbfc5ebd4c en_US.utf8/LC_COLLATE
       286  1883d0   1+ e184ed98f4e05dca7c06a4666b8f60cd en_US.utf8/LC_MONETARY
        57  1884f0   1+ e184ed98f4e05dca7c06a4666b8f60cd en_US.utf8/LC_MESSAGES/SYS_LC_MESSAGES
        34  188530   1+ e184ed98f4e05dca7c06a4666b8f60cd en_US.utf8/LC_PAPER
        77  188560   1+ e184ed98f4e05dca7c06a4666b8f60cd en_US.utf8/LC_NAME
       155  1885b0   1+ e184ed98f4e05dca7c06a4666b8f60cd en_US.utf8/LC_ADDRESS
        59  188650   1+ e184ed98f4e05dca7c06a4666b8f60cd en_US.utf8/LC_TELEPHONE
        23  188690   1+ e184ed98f4e05dca7c06a4666b8f60cd en_US.utf8/LC_MEASUREMENT
       361  1886b0   1+ e184ed98f4e05dca7c06a4666b8f60cd en_US.utf8/LC_IDENTIFICATION

It does pretend to have a LC_PAPER..

Installing (on the older debian system) the 'locales-all' package and rerunning:

    $ ./a.out
    LC_PAPER = en_US.UTF-8
    LC_PAPER(2) = en_US.UTF-8@LETTER

So now it was able to set LC_PAPER to 'en_US.UTF-8@LETTER' (or even to 'en_US.UTF-8@FOOBAR')... I believe the issue is with the installed locales on the system but I don't know of a way yet to fully confirm it..

bram-perl avatar Jul 24 '22 15:07 bram-perl

Looking at the source (in 5.37) again, I now don't see how the @paper would have broken perl.

I haven't seen the @ subforms documented, but that's what they appear to me to be. There are some locales that have @euro in them, presumable to indicate that the Euro is the currency to use instead of a generic one.

I'll look into seeing the feasibility of giving a better message, but when things fail at start-up, perl is paranoid about the basic sanity of the system, so views every result as unreliable, and would be reluctant to add more queries to the damage.

khwilliamson avatar Jul 24 '22 17:07 khwilliamson

My bashrc has the comment:

evince needs this to default to letter

export @.***"a

I see a few random mentions of this http://ubuntuliving.blogspot.com/2008/07/default-paper-size-in-evince.html

It also seems to cause confusing on CentOS 6 GNU/Linux, see below.

I can't find anything that says the @letter is a part of POSIX. Perhaps is it something that he evince authors made up???

% LC_PAPER="en_US.UTF-8" locale -ck LC_PAPER LC_PAPER height=279 width=216 paper-codeset="UTF-8"

% @.***" locale -ck LC_PAPER locale: Cannot set LC_ALL to default locale: No such file or directory LC_PAPER height=297 width=210 paper-codeset="ANSI_X3.4-1968"

% LC_PAPER="en_GB.UTF-8" locale -ck LC_PAPER LC_PAPER height=297 width=210 paper-codeset="UTF-8"

% @.***" locale -ck LC_PAPER locale: Cannot set LC_ALL to default locale: No such file or directory LC_PAPER height=297 width=210 paper-codeset="ANSI_X3.4-1968"

% @.***" locale -ck LC_PAPER locale: Cannot set LC_ALL to default locale: No such file or directory LC_PAPER height=297 width=210 paper-codeset="ANSI_X3.4-1968"

diekhans avatar Jul 25 '22 01:07 diekhans

Just for reference: the locale format is documented as[^1]:

    A locale name is typically of the form language[_territory][.codeset][@modifier]

Playing a bit more with it:

On an older debian system without the 'locales-all' package:

LC_PAPER height width paper-codeset
en_US.UTF-8 279 216 UTF-8
C 297 210 ANSI_X3.4-1968
en_US.UTF-8@letter[^2] 297 210 ANSI_X3.4-1968

On an older debian system with the 'locales-all' package:

LC_PAPER height width paper-codeset
en_US.UTF-8 279 216 UTF-8
C 297 210 ANSI_X3.4-1968
en_US.UTF-8@letter[^3] 279 216 UTF-8
nl_BE.UTF-8 297 210 UTF-8
nl_BE.UTF-8@letter 297 210 UTF-8

So it appears the modifier is ignored..

A look at evince (click for details)

Since your .bashrc indicates this is for evince I've decided to take a look at their source..

In evince git sources:

    $ git grep LC_PAPER
    $

=> no matches..

Looking further it appears to use gtk_page_setup_get_paper_size to get the paper size.

Switching to gtk sources, it does find a match for LC_PAPER: https://github.com/GNOME/gtk/blob/main/gtk/gtkpapersize.c#L768

Skimming the code shows that it does look at LC_PAPER but it doesn't look at the 'modifier' at all.. Taking a quick look at the history suggests that it never did.. (Note: just a quick look, I didn't trace it fully)

I suppose it is still possible that your distro patched something so that there is something that does look at the 'modifier' for LC_PAPER..

Now back to perl: can the warning be improved?

    $ ./Configure -des -Dusedevel -DDEBUGGING
    $ LC_ALL= LC_PAPER='en_US.UTF-8@letter' PERL_DEBUG_LOCALE_INIT=1 ./perl -e1
    locale.c:3529: setlocale(LC_ALL, "") returned NULL
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
            LANGUAGE = "en_US:en",
            LC_ALL = "",
            LC_PAPER = "en_US.UTF-8@letter",
            LANG = "en_US.UTF-8"
        are supported and installed on your system.
    locale.c:3529: setlocale(LC_ALL, "en_US.UTF-8") returned "en_US.UTF-8"
    locale.c:3554: setlocale(LC_NUMERIC, NULL) returned "en_US.UTF-8"
    locale.c:3554: setlocale(LC_CTYPE, NULL) returned "en_US.UTF-8"
    locale.c:3554: setlocale(LC_COLLATE, NULL) returned "en_US.UTF-8"
    locale.c:3554: setlocale(LC_TIME, NULL) returned "en_US.UTF-8"
    locale.c:3554: setlocale(LC_MESSAGES, NULL) returned "en_US.UTF-8"
    locale.c:3554: setlocale(LC_MONETARY, NULL) returned "en_US.UTF-8"
    locale.c:3554: setlocale(LC_ADDRESS, NULL) returned "en_US.UTF-8"
    locale.c:3554: setlocale(LC_IDENTIFICATION, NULL) returned "en_US.UTF-8"
    locale.c:3554: setlocale(LC_MEASUREMENT, NULL) returned "en_US.UTF-8"
    locale.c:3554: setlocale(LC_PAPER, NULL) returned "en_US.UTF-8"
    locale.c:3554: setlocale(LC_TELEPHONE, NULL) returned "en_US.UTF-8"

It tried to set the LC_ALL locale to the empty string and that returned NULL... And a return of NULL from setlocale means it failed...

Now a simply test program:

#include <stdio.h>
#include <locale.h>

int main(void)
{
    char * foo;
    foo = setlocale(LC_ALL, "");
    printf("LC_ALL = %s\n", foo);
}

Running:

    $ LC_ALL=en_US.UTF-8 ./a.out
    LC_ALL = en_US.UTF-8

    $ LC_ALL= LC_PAPER='en_US.UTF-8' ./a.out
    LC_ALL = en_US.UTF-8

    $ LC_ALL= LC_PAPER='C' ./a.out
    LC_ALL = LC_CTYPE=en_US.UTF-8;LC_NUMERIC=en_US.UTF-8;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=C;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8

    $ LC_ALL= LC_PAPER='en_US.UTF-8@letter' ./a.out
    LC_ALL = (null)

So the problem is that setlocale(LC_ALL, "") with LC_ALL= LC_PAPER='en_US.UTF-8@letter' returns NULL...

So as far as I can tell: perl isn't aware of what locale setting is incorrect so it can't properly warn...

Building perl and pretending LC_ALL doesn't exist[^4] and running it:

    $ LC_ALL= LC_PAPER='en_US.UTF-8@letter'  ./perl -e1
    perl: warning: Setting locale failed for the categories:
            LC_PAPERperl: warning: Please check that your locale settings:
            LANGUAGE = "en_US:en",
            LC_ALL = "",
            LC_PAPER = "en_US.UTF-8@letter",
            LANG = "en_US.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

Note: this output is missing some whitespace.. It should've looked like:

    perl: warning: Setting locale failed for the categories:
            LC_PAPER
    perl: warning: Please check that your locale settings:
    ...

And adding more failed locales makes the output even worse :-(

    $ LC_ALL= LC_NUMERIC='xx_YY' LC_PAPER='en_US.UTF-8@letter'  ./perl -e1
    perl: warning: Setting locale failed for the categories:
            LC_NUMERICLC_PAPERperl: warning: Please check that your locale settings:

-> Missing ' ' between the categories (or missing newlines)

TLDR-version:

To me it only seems possible to improve the warning message by not using LC_ALL.. (Or calling setlocale for each category (with the empty string) when LC_ALL failed)

[^1]: man 3 setlocale, https://www.gnu.org/software/libc/manual/html_node/Locale-Names.html [^2]: Shows the error/warning locale: Cannot set LC_ALL to default locale: No such file or directory [^3]: This is not a very good test; it seems to fallback to 'en_US.UTF-8' which by default is letter, so the modifier is not used at all [^4]: by adding an #undef LC_ALL in locale.c

bram-perl avatar Jul 25 '22 16:07 bram-perl

I have a somewhat work in progress patch that changes the output:

The patch (click to view)
diff --git a/locale.c b/locale.c
index 5aa194d..498aa5d 100644
--- a/locale.c
+++ b/locale.c
@@ -3277,6 +3277,7 @@ Perl_init_i18nl10n(pTHX_ int printwarn)
     const char * const lc_all     = PerlEnv_getenv("LC_ALL");
     const char * const lang       = PerlEnv_getenv("LANG");
     bool setlocale_failure = FALSE;
+    bool skip_setlocale_categories = FALSE;
     unsigned int i;

     /* A later getenv() could zap this, so only use here */
@@ -3483,6 +3484,7 @@ Perl_init_i18nl10n(pTHX_ int printwarn)
     for (i= 0; i < trial_locales_count; i++) {
         const char * trial_locale = trial_locales[i];
         setlocale_failure = FALSE;
+        skip_setlocale_categories = FALSE;

         if (i > 0) {
 #  ifdef SYSTEM_DEFAULT_LOCALE
@@ -3524,6 +3526,13 @@ Perl_init_i18nl10n(pTHX_ int printwarn)
         DEBUG_LOCALE_INIT(LC_ALL_INDEX_, trial_locale, sl_result[LC_ALL_INDEX_]);
         if (! sl_result[LC_ALL_INDEX_]) {
             setlocale_failure = TRUE;
+
+            if (lc_all && strNE(lc_all, "")) {
+                /* when the env var LC_ALL is set then we can skip the setlocale
+                 *  for each individual category since that will just reuse that
+                 *  value. Which really means: they would all fail. */
+                skip_setlocale_categories = TRUE;
+            }
         }
         else {
             /* Since LC_ALL succeeded, it should have changed all the other
@@ -3534,11 +3543,19 @@ Perl_init_i18nl10n(pTHX_ int printwarn)
              * fail, whereas setting LC_ALL succeeds, leaving LC_COLLATE set to
              * the POSIX locale. */
             trial_locale = NULL;
+            skip_setlocale_categories = TRUE;
         }

 #  endif /* LC_ALL */

-        if (! setlocale_failure) {
+        if (! setlocale_failure || ! skip_setlocale_categories) {
+            /* The above condition doesn't really make sense. It will
+             * always evalaute to TRUE.
+             *
+             * Reason for adding it like this:
+             * - on blead `curlocales[]` is only used on failure
+             * - on khw's work-in-progress branch `curlocales[]` is used
+             *   on success. */
             unsigned int j;
             for (j = 0; j < NOMINAL_LC_ALL_INDEX; j++) {
                 Safefree(curlocales[j]);
@@ -3549,10 +3566,10 @@ Perl_init_i18nl10n(pTHX_ int printwarn)
                 curlocales[j] = savepv(curlocales[j]);
                 DEBUG_LOCALE_INIT(j, trial_locale, curlocales[j]);
             }
+        }

-            if (LIKELY(! setlocale_failure)) {  /* All succeeded */
-                break;  /* Exit trial_locales loop */
-            }
+        if (LIKELY(! setlocale_failure)) {  /* All succeeded */
+            break;  /* Exit trial_locales loop */
         }

         /* Here, something failed; will need to try a fallback. */
@@ -3563,24 +3580,20 @@ Perl_init_i18nl10n(pTHX_ int printwarn)

             if (locwarn) { /* Output failure info only on the first one */

-#  ifdef LC_ALL
-
-                PerlIO_printf(Perl_error_log,
-                "perl: warning: Setting locale failed.\n");
-
-#  else /* !LC_ALL */
-
                 PerlIO_printf(Perl_error_log,
                 "perl: warning: Setting locale failed for the categories:\n");

-                for (j = 0; j < NOMINAL_LC_ALL_INDEX; j++) {
-                    if (! curlocales[j]) {
-                        PerlIO_printf(Perl_error_log, "\t%s\n", category_names[j]);
+                if (skip_setlocale_categories) {
+                    PerlIO_printf(Perl_error_log, "\tLC_ALL\n");
+                }
+                else {
+                    for (j = 0; j < NOMINAL_LC_ALL_INDEX; j++) {
+                        if (! curlocales[j]) {
+                            PerlIO_printf(Perl_error_log, "\t%s\n", category_names[j]);
+                        }
                     }
                 }

-#  endif /* LC_ALL */
-
                 PerlIO_printf(Perl_error_log,
                     "perl: warning: Please check that your locale settings:\n");

I consider this to be WIP because:

  • I don't know if this is a good idea to begin with;
  • changes made will conflict with the work @khwilliamson is doing
  • handling of invalid LANG is not good enough
  • the error is less clear/might not be good enough: when LC_PAPER is set incorrectly but LC_NUMERIC is set correctly then both of these will fallback to C. (Where one might read the error message as saying that only LC_PAPER fallbacked to C.)
  • ...

The output with the above WIP patch:

  • invalid LC_ALL:
    $ LC_ALL=aa_BB ./perl -e1
    perl: warning: Setting locale failed for the categories:
         LC_ALL
    perl: warning: Please check that your locale settings:
         LANGUAGE = "en_US:en",
         LC_ALL = "aa_BB",
         LANG = "en_US.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
    
  • invalid LC_PAPER:
    $  LC_ALL= LC_PAPER=aa_BB ./perl -e1
    perl: warning: Setting locale failed for the categories:
         LC_PAPER
    perl: warning: Please check that your locale settings:
         LANGUAGE = "en_US:en",
         LC_ALL = "",
         LC_PAPER = "aa_BB",
         LANG = "en_US.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
    
  • invalid LANG:
    $ LC_ALL= LANG=cc_DD ./perl -e1
    perl: warning: Setting locale failed for the categories:
         LC_NUMERIC
         LC_CTYPE
         LC_COLLATE
         LC_TIME
         LC_MESSAGES
         LC_MONETARY
         LC_ADDRESS
         LC_IDENTIFICATION
         LC_MEASUREMENT
         LC_PAPER
         LC_TELEPHONE
    perl: warning: Please check that your locale settings:
         LANGUAGE = "en_US:en",
         LC_ALL = "",
         LANG = "cc_DD"
        are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    

Changing that last one to:

    perl: warning: Setting locale failed for the categories:
         LANG

might be possible but that doesn't really make sense since LANG isn't a locale category.

bram-perl avatar Aug 07 '22 15:08 bram-perl

@diekhans Please try this again on 5.37.3. I am not now seeing any messages

khwilliamson avatar Aug 21 '22 02:08 khwilliamson

The complaint still happens. The root causes there is no @.***" installed. For some reason, evince needs (or needed) this; which I think maybe an invalid use of the locale system.

I believe the best solution is solution-directed error message. However, how to "check locale settings" is not straight forward. All I have been able to do is grep through the results of "locale -a". Perhaps suggesting this in the error message would be a good solution. It doesn't appear that this is a common problem.

% perl5.37.3 --version
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US.UTF-8",
        LC_ALL = (unset),
        LC_PAPER = ***@***.***",
        LC_ADDRESS = "en_US.UTF-8",
        LC_MONETARY = "en_US.UTF-8",
        LC_NUMERIC = "C",
        LC_TELEPHONE = "en_US.UTF-8",
        LC_MESSAGES = "en_US.UTF-8",
        LC_IDENTIFICATION = "en_US.UTF-8",
        LC_COLLATE = "C",
        LC_MEASUREMENT = "en_US.UTF-8",
        LC_CTYPE = "C",
        LC_TIME = "en_US.UTF-8",
        LC_NAME = "en_US.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

This is perl 5, version 37, subversion 3 (v5.37.3) built for x86_64-linux

diekhans avatar Aug 21 '22 19:08 diekhans

On 7/24/22 19:36, Mark Diekhans wrote:

My bashrc has the comment:

evince needs this to default to letter

export @.***"a

I see a few random mentions of this http://ubuntuliving.blogspot.com/2008/07/default-paper-size-in-evince.html

It also seems to cause confusing on CentOS 6 GNU/Linux, see below.

I can't find anything that says the @letter is a part of POSIX. Perhaps is it something that he evince authors made up???

This was pointed out to me:

https://www.gnu.org/software/libc/manual/html_node/Locale-Names.html

khwilliamson avatar Oct 11 '22 09:10 khwilliamson