perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

5.40.0 onwards won't build Darwin fat binaries

Open gsteemso opened this issue 1 year ago • 21 comments

Description All Perls which I have built thus far will correctly build as NeXT-style fat (multi-architecture) binaries when the directions in README.macosx are followed – except the newest ones, beginning with version 5.40.0, which fail messily at the make step (apparently, by applying the variable-size expectations appropriate to 64-bit sub-builds to the corresponding variables in 32-bit sub-builds). It generates thousands of lines of compiler warnings that a bit shift has exceeded the width of the data type, culminating in a fatal error involving a negative bit-field width.

Steps to Reproduce Follow the instructions in README.macosx. Run ./Configure, then make. You can’t run make test and make install because it errors out during make.

The ./Configure line I used was:

Prefix=/Users/gsteemso/devel/perl/built/5.40.0u
Sdk=/Developer/SDKs/MacOSX10.5.sdk
./Configure -des -Dprefix=${Prefix} -Uvendorprefix= -Dprivlib=${Prefix}/lib -Darchlib=${Prefix}/lib -Dman1dir=${Prefix}/share/man/man1 -Dmsn3dir=${Prefix}/share/man/man3 -Dman3ext=3pl -Doptimize=-Os -Dsitearch=${Prefix}/lib/site_perl -Dsitelib=${Prefix}/lib/site_perl -Dperladmin=none -Dstartperl='#!/usr/local/opt/perl/bin/perl' -Duseshrplib -Duselargefiles -Dusenm -Dusethreads -Accflags="-DNO_MATHOMS -mcpu=970 -arch ppc -arch ppc64 -nostdinc -B${Sdk}/usr/include/gcc -B${Sdk}/usr/lib/gcc -isystem${Sdk}/usr/include -F${Sdk}/System/Library/Frameworks" -Aldflags="-arch ppc -arch ppc64 -Wl,-syslibroot,${Sdk}"

Expected behavior Perl should be constructed and installed as per usual, with all compiled code built as fat binaries in the manner normal for Macs.

Perl configuration The configuration cannot be extracted because Perl never finishes building, but the corresponding one for a pure ppc64 build looks like this:

Summary of my perl5 (revision 5 version 40 subversion 0) configuration:

 Platform:
   osname=darwin
   osvers=9.8.0
   archname=darwin-thread-multi-2level
   uname='darwin nosferalto.local 9.8.0 darwin kernel version 9.8.0:
wed jul 15 16:57:01 pdt 2009; root:xnu-1228.15.4~1release_ppc power
macintosh '
   config_args='-des -Dprefix=/Users/gsteemso/devel/perl/built/5.40.0
-Uvendorprefix= -Dprivlib=/Users/gsteemso/devel/perl/built/5.40.0/lib
-Darchlib=/Users/gsteemso/devel/perl/built/5.40.0/lib
-Dman1dir=/Users/gsteemso/devel/perl/built/5.40.0/share/man/man1
-Dman3dir=/Users/gsteemso/devel/perl/built/5.40.0/share/man/man3
-Dman3ext=3pl -Doptimize=-Os
-Dsitearch=/Users/gsteemso/devel/perl/built/5.40.0/lib/site_perl
-Dsitelib=/Users/gsteemso/devel/perl/built/5.40.0/lib/site_perl
-Dperladmin=none -Dstartperl=#!/usr/local/opt/perl/bin/perl
-Duseshrplib -Duselargefiles -Dusenm -Dusethreads -Duse64bitall
-Accflags=-DNO_MATHOMS -mcpu=970 -arch ppc64 -nostdinc
-B/Developer/SDKs/MacOSX10.5.sdk/usr/include/gcc
-B/Developer/SDKs/MacOSX10.5.sdk/usr/lib/gcc
-isystem/Developer/SDKs/MacOSX10.5.sdk/usr/include
-F/Developer/SDKs/MacOSX10.5.sdk/System/Library/Frameworks
-Aldflags=-arch ppc64 -Wl,-syslibroot,/Developer/SDKs/MacOSX10.5.sdk
-Alddlflags=-mmacosx-version-min=10.5 -arch ppc64 -bundle -undefined
dynamic_lookup -L/usr/local/lib -fstack-protector'
   hint=recommended
   useposix=true
   d_sigaction=define
   useithreads=define
   usemultiplicity=define
   use64bitint=define
   use64bitall=define
   uselongdouble=undef
   usemymalloc=n
   default_inc_excludes_dot=define
 Compiler:
   cc='cc'
   ccflags ='-std=gnu99 -fno-common -DPERL_DARWIN
-mmacosx-version-min=10.5 -DNO_THREAD_SAFE_QUERYLOCALE
-DNO_POSIX_2008_LOCALE -arch ppc64 -DNO_MATHOMS -mcpu=970 -arch ppc64
-nostdinc -B/Developer/SDKs/MacOSX10.5.sdk/usr/include/gcc
-B/Developer/SDKs/MacOSX10.5.sdk/usr/lib/gcc
-isystem/Developer/SDKs/MacOSX10.5.sdk/usr/include
-F/Developer/SDKs/MacOSX10.5.sdk/System/Library/Frameworks
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_FORTIFY_SOURCE=2'
   optimize='-Os'
   cppflags='-arch ppc64 -std=gnu99 -fno-common -DPERL_DARWIN
-mmacosx-version-min=10.5 -DNO_THREAD_SAFE_QUERYLOCALE
-DNO_POSIX_2008_LOCALE -arch ppc64 -DNO_MATHOMS -mcpu=970 -arch ppc64
-nostdinc -B/Developer/SDKs/MacOSX10.5.sdk/usr/include/gcc
-B/Developer/SDKs/MacOSX10.5.sdk/usr/lib/gcc
-isystem/Developer/SDKs/MacOSX10.5.sdk/usr/include
-F/Developer/SDKs/MacOSX10.5.sdk/System/Library/Frameworks
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
   ccversion=''
   gccversion='4.2.1 (Apple Inc. build 5666) (dot 3)'
   gccosandvers=''
   intsize=4
   longsize=8
   ptrsize=8
   doublesize=8
   byteorder=87654321
   doublekind=4
   d_longlong=define
   longlongsize=8
   d_longdbl=define
   longdblsize=16
   longdblkind=6
   ivtype='long'
   ivsize=8
   nvtype='double'
   nvsize=8
   Off_t='off_t'
   lseeksize=8
   alignbytes=8
   prototype=define
 Linker and Libraries:
   ld='cc -arch ppc64'
   ldflags =' -mmacosx-version-min=10.5 -arch ppc64 -arch ppc64
-Wl,-syslibroot,/Developer/SDKs/MacOSX10.5.sdk -fstack-protector
-L/usr/local/lib'
   libpth=/usr/local/lib /usr/lib
   libs=-lpthread -ldbm -ldl -lm -lutil -lc
   perllibs=-lpthread -ldl -lm -lutil -lc
   libc=/usr/lib/libc.dylib
   so=dylib
   useshrplib=true
   libperl=libperl.dylib
   gnulibc_version=''
 Dynamic Linking:
   dlsrc=dl_dlopen.xs
   dlext=bundle
   d_dlsymun=undef
   ccdlflags=' '
   cccdlflags=' '
   lddlflags=' -mmacosx-version-min=10.5 -bundle -undefined
dynamic_lookup -mmacosx-version-min=10.5 -arch ppc64 -bundle
-undefined dynamic_lookup -L/usr/local/lib -fstack-protector'


Characteristics of this binary (from libperl):
 Compile-time options:
   HAS_LONG_DOUBLE
   HAS_STRTOLD
   HAS_TIMES
   MULTIPLICITY
   NO_MATHOMS
   PERLIO_LAYERS
   PERL_COPY_ON_WRITE
   PERL_DONT_CREATE_GVSV
   PERL_HASH_FUNC_SIPHASH13
   PERL_HASH_USE_SBOX32
   PERL_MALLOC_WRAP
   PERL_OP_PARENT
   PERL_PRESERVE_IVUV
   PERL_USE_SAFE_PUTENV
   USE_64_BIT_ALL
   USE_64_BIT_INT
   USE_ITHREADS
   USE_LARGE_FILES
   USE_LOCALE
   USE_LOCALE_COLLATE
   USE_LOCALE_CTYPE
   USE_LOCALE_NUMERIC
   USE_LOCALE_TIME
   USE_PERLIO
   USE_PERL_ATOF
   USE_REENTRANT_API
 Built under darwin
 Compiled at Aug  4 2024 15:00:48
 @INC:
   /Users/gsteemso/devel/perl/built/5.40.0/lib/site_perl
   /Users/gsteemso/devel/perl/built/5.40.0/lib

(It should be noted that, in a pure ppc64 build done without the aid of a package manager, -Alddlflags=xxxxx must also be set by hand on the ./Configure command line, because extensions' shared-library makefiles fail to propagate the compiler flags that tell it which variant of the CPU architecture to target. Without that change, the individual .o files are still built correctly, but their coalescence into library .bundles is botched.)

gsteemso avatar Aug 05 '24 00:08 gsteemso

Could you please attach a build log and the generated config.sh?

tonycoz avatar Aug 05 '24 23:08 tonycoz

Please see these attachments. There should be 5. I included config.sh (I had to rename it for Github) and the stdout and stderr captures for each of Configure and make. config.sh.txt Configure.stdout.log Configure.stderr.log make.stdout.log make.stderr.log

gsteemso avatar Aug 11 '24 17:08 gsteemso

I should add that something went a bit odd a couple of days ago, such that a lot of GCC's usual stderr output has stopped appearing. I'm still trying to find anything that changed.

gsteemso avatar Aug 11 '24 17:08 gsteemso

Please see these attachments. There should be 5. I included config.sh (I had to rename it for Github) and the stdout and stderr captures for each of Configure and make. config.sh.txt Configure.stdout.log Configure.stderr.log make.stdout.log make.stderr.log

I have no particular expertise in this area. However, it occurs to me that since you are getting a segfault as early in the process as ./Configure, you could begin by getting a tarball of perl-5.38, configuring with the same arguments as previously, and seeing whether ./Configure completes successfully and segfault-free. That would open up the possibility of bisection.

jkeenan avatar Aug 11 '24 19:08 jkeenan

I believe the segfault during ./Configure is probably an expected failure resulting from an unsuccessful test, because it does not seem to bother it any. The build process continues unimpeded until the big halt in what ought to be the middle of that 'make' run.

I can already tell you that 5.38.x builds successfully with the same parameters, as do all earlier versions that I tried. That 5.40.0 et seq do not build successfully when all others did before is the entire problem here.

gsteemso avatar Aug 12 '24 01:08 gsteemso

I can already tell you that 5.38.x builds successfully with the same parameters, as do all earlier versions that I tried. That 5.40.0 et seq do not build successfully when all others did before is the entire problem here.

So in principle this is bisectable, with (roughly) these steps:

  • If you do not already have a git checkout of the core distribution on this machine, follow the instructions in perldoc perlgit to clone the repository and get a local checkout.
  • As a precaution, run the start and end commits manually: git checkout v5.38.0; sh ./Configure -des -Dusedevel [your other config options or a simplified subset thereof] && make. That should complete successfully. git clean -dfxq; git checkout v5.40.0; [as above]; that should fail. Note: If you can reproduce the build failure at tag 5.40.0 with a smaller list of configuration options (e.g., just -Dusenm -Dusethreads -Duse64bitall -Accflags=-DNO_MATHOMS), that would greatly simplify our analysis.
  • The documentation for bisection can be found in perldoc Porting/bisect-runner.pl. The bisection program itself will be run something like this:
$ perl Porting/bisect.pl \
-D[config options] \
--start=v5.38.0 \
--end=v5.40.0 \
--test-build

bisect.pl will run in the checkout. Assuming it gets past the end and start revisions, it will start to log its process in the ./.git subdirectory beneath the checkout. You can follow that progress in a separate terminal with something like cat BISECT_RUN; tail BISECT_LOG. HTH!

jkeenan avatar Aug 12 '24 12:08 jkeenan

Please try adding -Duse64bitint to the Configure command-line to ensure both 32-bit and 64-bit builds are using the same sized UV and IV types, I suspect they're different here causing the static assertion to fail.

I'm able to compile a -arch x86_64 -arch arm64 build, but those are both 64-bit builds, so there's no type size mismatches.

tonycoz avatar Aug 15 '24 05:08 tonycoz

Well, I have a few things to report.

• Adding -Duse64bitint did not help. I can’t imagine why not – as was pointed out, it ought to make things the same size internally. (Of course, even if it had worked, the resulting executable would not be transportable to lesser Macs – defeating a large part of the purpose of building a fat binary in the first place. It’s still bizarre that it didn’t work, of course.)

• The thousands of lines of warnings about a bit shift exceeding the width of the type actually, it turns out, also occur on a successful build; so I believe the hypothesis about it being due to the size difference between compiler runs is likely correct. I have set up and am currently running the suggested bisection to figure out what change made it start having actual build problems with the perceived mismatch.

This is a fast machine for its age but that age is 20 years. I will almost certainly not get answers from the bisection before tomorrow (Friday), and quite possibly not until Saturday.

gsteemso avatar Aug 16 '24 01:08 gsteemso

Apparently I spoke too soon. That bisection program is genius! in 2602 seconds, it determined that the first commit to cause a failure was 1e3b3238f23137440041d8883e041e4da74876f5, dated March 13th 2024.

The command line I fed to bisect was:

../othergit/Porting/bisect.pl --test-build --target=miniperl --start=v5.38.0 --end=v5.40.0 -Dprefix=../built -Uvendorprefix= -Dperladmin=none -Duseshrplib -Duselargefiles -Dusenm --Dusethreads -Accflags='-DNO_MATHOMS -arch ppc -arch ppc64 -nostdinc -B/Developer/SDKs/MacOSX10.5.sdk/usr/include/gcc -B/Developer/SDKs/MacOSX10.5.sdk/usr/lib/gcc -isystem/Developer/SDKs/MacOSX10.5.sdk/usr/include -F/Developer/SDKs/MacOSX10.5.sdk/System/Library/Frameworks' -Aldflags='-arch ppc -arch ppc64 -Wl,-syslibroot,/Developer/SDKs/MacOSX10.5.sdk'

I hope this all means something useful to someone.

gsteemso avatar Aug 16 '24 05:08 gsteemso

I don't see how -Duse64bitint could help. If you run Configure on a 64-bit platform, it will just see that sizeof (long) == 8 and use that, hardcoding #define IVTYPE long in config.h. Plus we have INTSIZE, LONGSIZE, SHORTSIZE all hardcoded/configured in config.h.

As far as I can tell, building for different architectures requires different configs.

mauke avatar Aug 16 '24 06:08 mauke

Apparently I spoke too soon. That bisection program is genius! in 2602 seconds, it determined that the first commit to cause a failure was 1e3b323, dated March 13th 2024.

The command line I fed to bisect was:

../othergit/Porting/bisect.pl --test-build --target=miniperl --start=v5.38.0 --end=v5.40.0 -Dprefix=../built -Uvendorprefix= -Dperladmin=none -Duseshrplib -Duselargefiles -Dusenm --Dusethreads -Accflags='-DNO_MATHOMS -arch ppc -arch ppc64 -nostdinc -B/Developer/SDKs/MacOSX10.5.sdk/usr/include/gcc -B/Developer/SDKs/MacOSX10.5.sdk/usr/lib/gcc -isystem/Developer/SDKs/MacOSX10.5.sdk/usr/include -F/Developer/SDKs/MacOSX10.5.sdk/System/Library/Frameworks' -Aldflags='-arch ppc -arch ppc64 -Wl,-syslibroot,/Developer/SDKs/MacOSX10.5.sdk'

I hope this all means something useful to someone.

A number of points ...

  • Based on perldoc Porting/bisect-runner.pl, I think you should have said -Dusethreads (one hyphen) in the above rather than --Dusethreads. However, I can't say that that made any difference in the results.

  • I have not previously seen both --test-build and --target=miniperl used in an invocation to bisect.pl, so I'm slightly skeptical of that result. My understanding (which may be incorrect) is that with --test-build we need to get as far as ./perl for a PASS, whereas with --target=miniperl we only need to get to ./miniperl. Could you try a bisection using only the latter to see if it identifies the same commit as breaking? That would be:

perl Porting/bisect.pl \
-D[your other config options] \
--start=b9b8c7d2e8567b5c6652a643b4a44af22e06f2bc \
--end=4f872e99736a2242a86b234af32d603b84956352 \
--target=miniperl

Then try:

perl Porting/bisect.pl \
-D[your other config options] \
--start=b9b8c7d2e8567b5c6652a643b4a44af22e06f2bc \
--end=4f872e99736a2242a86b234af32d603b84956352 \
--test-build

Do you get the same results?

  • What would also be helpful is if you could determine whether the interaction of 1e3b3238f2 and one or more of your many config options caused the failure to build. Consider repeating the above bisection without any -D[config options] and see whether even that fails to bulid.
perl Porting/bisect.pl \
--start=b9b8c7d2e8567b5c6652a643b4a44af22e06f2bc \
--end=4f872e99736a2242a86b234af32d603b84956352 \
--target=miniperl

Then add config options one at a time, starting with those that are not Mac-specific such as -Dusethreads.

jkeenan avatar Aug 16 '24 11:08 jkeenan

mauke, the Mac-specific aspects of this are a bit odd by everyone else's standards, because they build for all listed architectures simultaneously -- there is only one Configure run for ALL of them combined, not one per architecture as I understand might be done, for example, under Linux. That's why we thought -Duse64bitint might have made it build successfully.

jkeenan, in order: • you're correct, I ran that with one hyphen and accidentally typo'd when copying the line into my email. • --test-build and --target=xxx are required to both be used in this case. According to the documentation, without the first a separate test case must be specified (it gets run after the build succeeds, which is expected to happen every time, and would not here), and without the second, successfully completing the build is assumed (possibly the source of your misapprehension). When I tried running it using only --target=xxxx, it refused to run at all and merely gave me back the usage instructions (I hadn't specified a test case). • you were absolutely correct that I specified more options than were required. I am rerunning the bisection with no -D options at all except -Dprefix=xxxx (to prevent it overwriting my system Perl), and only those -A options listed as being necessary for a Mac-style multi-arch build (both of them, unfortunately, but if you look closely you'll see that nearly all the given components do nothing except tell the compiler where the system libraries are).

gsteemso avatar Aug 16 '24 17:08 gsteemso

I should add that I had to restrict the test builds to only try as far as building miniperl, because various of the library modules require an assortment of trivial patches to build correctly under specific versions of Perl. Luckily the failure occurs during that early phase, so I did not need to muck about trying to tell it to apply a patch only during builds of certain versions.

gsteemso avatar Aug 16 '24 17:08 gsteemso

they build for all listed architectures simultaneously -- there is only one Configure run for ALL of them combined, not one per architecture

The only way that could work is if IVTYPE = int64_t and UVTYPE = uint64_t, but there is no way to force Configure to choose those as far as I can see.

mauke avatar Aug 16 '24 17:08 mauke

I won't pretend I understand how it works. The Mac compiler that was current at that time, and which I am using now, was a modified version of GCC 4.2.1. It could be given any number of “-arch xxxx” parameters and would then repeat each compilation with all of the other parameters the same, but a distinct target platform; then stitch the results into a universal binary using a tool with the amusing name “lipo” (because it was often used to slim a fat binary down to a single-architecture slice). The five platforms then current were “ppc” (32-bit PowerPC), “i386” (32-bit x86 – they were all lumped together as “i386” even though the cross-compiler, for example, had a prefix containing “686”), “arm” (32-bit ARM as used in early iPhones), “ppc64” (64-bit PowerPC, which consisted solely of the IBM PowerPC 970), and “x86_64” (exactly what it says). Of those, only two were even able to handle 64-bit data in a single action; yet Perl has historically been able to compile with 32- and 64-bit values (IVs, UVs, etc.) simultaneously. At first glance I'd have assumed it just compiled everything with 32-bit NVs, but Configure does in fact appear to take 64-bit platforms as 64-bit. I have no idea how it works but it did up until, as “bisect” has once again informed me, the same commit I named earlier.

gsteemso avatar Aug 16 '24 18:08 gsteemso

Reverting that one commit against blead had no effect.

gsteemso avatar Aug 16 '24 19:08 gsteemso

I don't see how -Duse64bitint could help. If you run Configure on a 64-bit platform, it will just see that sizeof (long) == 8 and use that, hardcoding #define IVTYPE long in config.h. Plus we have INTSIZE, LONGSIZE, SHORTSIZE all hardcoded/configured in config.h.

As far as I can tell, building for different architectures requires different configs.

I remember building multiarch with i386 and x86_64 in one binary.

There were definitely some config issues, Configure has a darwin specific check to ensure alignbytes is at least 8 on darwin.

I never looked too hard at it.

I suspect this case isn't so much a new bug, but the static assert detecting an old bug.

tonycoz avatar Aug 18 '24 23:08 tonycoz

I'm not actually certain there is a true bug, here. Is it plausible that the assert is framed in such a way that it gets a false positive from the disparity in word sizes between built-for architectures?

gsteemso avatar Sep 01 '24 04:09 gsteemso

I'm not actually certain there is a true bug, here.

If that particular assertion fails the code following won't be valid., it may lose precision when converting from an NV to an IV but report an exact conversion.

tonycoz avatar Sep 04 '24 04:09 tonycoz

Let me rephrase my supposition. I'm aware that's the purpose of the assert. The reason the assert is failing is that, on a 32-bit build, the total size of the NV is smaller than the (for a 64-bit build) reported size that is transferrable with full accuracy. (I think I got that straight, there are something like six different figures involved for three different quantities, or thereabouts.) The same figures being used for both 32- and 64-bit builds – which happen simultaneously in a universal binary – are, I believe, causing a "false positive" (false negative?) assertion failure.

gsteemso avatar Sep 13 '24 04:09 gsteemso

The assert should be passing during the 64-bit build pass and incorrectly failing during the 32-bit build pass.

gsteemso avatar Sep 13 '24 04:09 gsteemso