perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

lib/locale.t pseudorandomly fails on HP-UX x2 B.11.31

Open khwilliamson opened this issue 1 year ago • 3 comments

Description

This is the most interesting bug to debug I've had in quite a while. The bug turns out to be in HP-UX, but there are things in lib/locale.t to be done to work around it, and to display better diagnostics.

Smokes for this box have been failing recently, but when I ran them by hand, they passed. @Tux had the bright idea to run 20 tests in a row, and failures occurred on two of those. I was then able to reproduce it. Sometimes it happened after just a couple of runs; sometimes it took 50. The box doesn't have valgrind, and my normal testing on Linux is to use ASAN, and there were no competing threads executing, so what could it be?

The failure is always in a Japanese UTF-8 locale, checking that a system error message that contains non-ASCII characters has the UTF-8 flag set. The test tries all the possible error numbers on the platform by looking at keys %! until it finds one that contains a non-ASCII character; then looks at the UTF-8 flag. And then goes on to the next locale

To shorten this description, it turns out that this one liner invariably gives illegal UTF-8: LC_ALL=ja_JP.utf8 myperl -Mlocale -le '$!=153; print "$!"' namely カーネルモジュールのロード中にオブジェクトファイルエ�

The last byte marked as is illegal; it is the start of an incomplete sequence. Using google translate on the legal portions yields Object file error while loading kernel module.

Using English instead LC_ALL=en_US.utf8 myperl -Mlocale -le '$!=153; print "$!"' yields this Object file error in loading kernel module

If you count the number of bytes in the Japanese version, it is 79. Adding a NUL byte yields 80, a magic number in computers. My educated guess is that HP has declared a buffer to be 80 bytes, and truncates beyond that, without considering if that is in the middle of a UTF-8 character.

Other messages are shorter, and don't try to overflow the buffer; but there are a few others that do overflow in this locale

So, why does this happen only sometimes? It is because the loop in the test file reads: foreach my $err (keys %!) {

It should instead be foreach my $err (sort keys %!) {

which would give reliably reproducible results. The randomness in the return of keys showed up here that was deliberately introduced by @demerphq to catch issues. Recall that the test stops at the first error message that contains non-ASCII. Most of the time the keys would return an error message that fits in the buffer, that would pass the test and we'd be done with it. Rarely, it would return first one of the few longer messages that don't fit.

Most platforms we test on have not bothered to translate the system error messages to the languages of the locales being run in, so this issue would not show up on them even if they have a limited buffer, because 80 bytes is sufficient for English error messages.

The HP-UX man pages say that it has a strerror_r function. But Configure doesn't detect it. The code there looks correct, so I don't know why it isn't found unless the man pages are wrong. On a system with that function, the buffer and size are passed to it, so we aren't limited by what the vendor thinks is a reasonable size. And in fact, we would pass a 256 byte buffer by default, which should have plenty of room, so that would be an easy fix for this. But why isn't that function being detected?

Steps to Reproduce `./perl -Ilib lib/locale.t'

over and over until you get a failure

Expected behavior

The tests should pass

Perl configuration

`irrelevant``

perl -V output goes here

`Summary of my perl5 (revision 5 version 39 subversion 8) configuration: Derived from: 4210e234cb45c0ad3ecd56b85152559b52131e16 Platform: osname=hpux osvers=11.31 archname=IA64.ARCHREV_0-thread-multi uname='hp-ux x2 b.11.31 u ia64 1894272509 unlimited-user license ' config_args='-des -Uversiononly -Dprefix=/data/perl/usr/khw/blead -Dusedevel -Doptimize=-O0 -DDEBUGGING -A'optimize=-g' -Accflags='-DNO_MATHOMS' -Dman1dir='none' -Dman3dir='none' -Dcc=cc -Dusecbacktrace -Dusethreads' hint=recommended useposix=true d_sigaction=define useithreads=define usemultiplicity=define use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define Compiler: cc='cc' ccflags =' -D_POSIX_C_SOURCE=199506L -D_REENTRANT -Ae -Wp,-H150000 -D_HPUX_SOURCE -Wl,+vnocompatwarnings -DNO_MATHOMS -DDEBUGGING -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 ' optimize='-O0 -g' cppflags='-Aa -D__STDC_EXT__ -D_HPUX_SOURCE -D_POSIX_C_SOURCE=199506L -D_REENTRANT -Ae -Wp,-H150000 -D_HPUX_SOURCE -Wl,+vnocompatwarnings -DNO_MATHOMS -DDEBUGGING -I/usr/local/include' ccversion='B3910B A.06.28.02' gccversion='' gccosandvers='' intsize=4 longsize=4 ptrsize=4 doublesize=8 byteorder=4321 doublekind=4 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=2 ivtype='long' ivsize=4 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='/usr/bin/ld' ldflags =' -L/usr/local/lib -L/usr/lib/hpux32' libpth=/usr/local/lib /usr/lib/hpux32 /lib /usr/lib /usr/ccs/lib libs=-lcl -lpthread -lndbm -lgdbm -ldl -lm -lsec -lc perllibs=-lcl -lpthread -ldl -lm -lsec -lc libc=/usr/lib/hpux32/libc.so so=so useshrplib=false libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_hpux.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E -Wl,-B,deferred ' cccdlflags='+Z' lddlflags='-b +vnocompatwarnings -L/usr/local/lib -L/usr/lib/hpux32'

Characteristics of this binary (from libperl): Compile-time options: DEBUGGING HAS_LONG_DOUBLE HAS_STRTOLD HAS_TIMES MULTIPLICITY NO_MATHOMS PERLIO_LAYERS PERL_COPY_ON_WRITE PERL_DONT_CREATE_GVSV PERL_HASH_FUNC_ZAPHOD32 PERL_HASH_USE_SBOX32 PERL_MALLOC_WRAP PERL_OP_PARENT PERL_PRESERVE_IVUV PERL_TRACK_MEMPOOL PERL_USE_DEVEL PERL_USE_SAFE_PUTENV USE_ITHREADS USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME USE_PERLIO USE_PERL_ATOF USE_REENTRANT_API Locally applied patches: uncommitted-changes Built under hpux Compiled at Feb 15 2024 02:41:22 %ENV: PERL5OPT="-w" PERL_DIFF_TOOL="wgdiff" PERL_POD_PEDANTIC="1" PERL_TEST_HARNESS_ASAP="1" @INC: /data/perl/usr/khw/perl/blead_cc/lib /data/perl/usr/khw/perl/blead_cc/t /data/perl/usr/khw/blead/lib/site_perl/5.39.8/IA64.ARCHREV_0-thread-multi /data/perl/usr/khw/blead/lib/site_perl/5.39.8 /data/perl/usr/khw/blead/lib/5.39.8/IA64.ARCHREV_0-thread-multi /data/perl/usr/khw/blead/lib/5.39.8 ``

khwilliamson avatar Feb 15 '24 03:02 khwilliamson

On Thu, 15 Feb 2024, 11:00 Karl Williamson, @.***> wrote:

So, why does this happen only sometimes? It is because the loop in the test file reads: foreach my $err (keys %!) {

It should instead be foreach my $err (sort keys %!) {

which would give reliably reproducible results. The randomness in the return of keys showed up here that was deliberately introduced by @demerphq https://github.com/demerphq to catch issues.

But if this bug is sensitive to key order then doesnt that mean that sorting the keys might hide such bugs just as much as it might cause them to be seen every time? Ie, if the list was sorted you might never see it at all? Also if the key order matters then isn't it likely there is a more complex interaction going on than buffer truncation?

Fwiw, the key order is randomized partially as a security measure and partially to prevent users from baking in a dependency on a specific hash function so we are free to change the hash function in the future, the fact it helps find edge case bugs is just a bonus and not an intent.

Yves

demerphq avatar Feb 15 '24 05:02 demerphq

See the tail of my amended message. Yes sorting is not the answer to this bug. It isn't because of a more complex interaction; it is because the test doesn't look at every available message; it stops when it gets to the first one containing non-ASCII. That means if the sorted first one doesn't get truncated, then the test passes; if the first one is too long, the test reliably fails. The test was not designed to be comprehensive; we could make it so by testing every error on the platform.

The bug is in HPUX, not perl. I'm unsure of what the best workaround is. sorting does hide the problem, since the first message returned in this locale passes.

khwilliamson avatar Feb 15 '24 05:02 khwilliamson

Given it is an OS error and not a perl error and assuming that this specific case will never get an update from HP (at least not on the box this flaw was found on), would the best test fix not be to check only on say the first 60 valid characters?

diff --git a/lib/locale.t b/lib/locale.t
index 498c7f3ce8..84e2caac0b 100644
--- a/lib/locale.t
+++ b/lib/locale.t
@@ -2184,8 +2184,8 @@ foreach my $Locale (@Locale) {
                 foreach my $err (keys %!) {
                     use Errno;
                     $! = eval "&Errno::$err";   # Convert to strerror() output
-                    my $errnum = 0+$!;
-                    my $strerror = "$!";
+                    my $errnum   = 0+$!;
+                    my $strerror = substr "$!", 0, 60;
                     if ("$strerror" =~ /\P{ASCII}/) {
                         $ok14 = utf8::is_utf8($strerror);
                         no locale;
@@ -2240,7 +2240,7 @@ foreach my $Locale (@Locale) {
             no locale;
             use Errno;
             $! = eval "&Errno::$err";   # Convert to strerror() output
-            my $strerror = "$!";
+            my  $strerror = substr "$!", 0, 60;
             if ($strerror =~ /\P{ASCII}/) {
                 $ok21 = 0;
                 debug(disp_str("non-ASCII strerror=$strerror"));

/me realizes that the substr might be executed on a raw string, which just "moves" the problem, but it is just for the tought

Tux avatar Feb 15 '24 07:02 Tux

Is this a new issue in 5.39? I.e. is it a release blocker, or has it been around since at least 5.38 if not earlier?

Additionally, I see some recent commits by @khwilliamson - do those fix the issue? Or does it remain even despite those?

leonerd avatar Mar 14 '24 17:03 leonerd

This has been fixed by 3c1cbf53549280c1468a00dd3c5ae7055c0a0bbe. Now strerror_r is selected to substitute automatically for strerror via reentr.h, and we use a large enough buffer size to contain the text that overflows the buffer that HP-UX provides for plain strerror. strerror_r is not in C99, and this bug could occur on other systems, the ones whose providers haven't kept up with the shift to Unicode. I looked at a more general fix, but it wasn't simple, and we don't know if any such systems exist that don't have strerror_r. So I'm closing this

And this is a libc bug, not a perl bug

khwilliamson avatar Mar 14 '24 23:03 khwilliamson