dmd
dmd copied to clipboard
Fix macOS ld: multiple errors: symbol count from symbol table and dynamic symbol table differ
Encode macosx_version_min or build_version into the object file, originally authored by @jacob-carlborg in #10476.
This has been simplified to omit parsing the SDK version. Based on what I see GCC is doing (-platform_version macos $version_min 0.0), this information is not required in order for things to work.
Either this will fix the new ld errors, or we'll have to start adding -L-ld_classic to the linker command.
https://forum.dlang.org/thread/[email protected]
Thanks for your pull request, @ibuclaw!
Bugzilla references
Your PR doesn't reference any Bugzilla issue.
If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.
Testing this PR locally
If you don't have a local development environment setup, you can use Digger to test this PR:
dub run digger -- build "master + dmd#16178"
ld: load command #3 extends beyond the end of the load commands file '../generated/osx/release/64/lexer.o' for architecture x86_64
It looks like encoding this information into the binary isn't working, but passing it via linker flags works just fine. Someone (not me) will have to repurpose the OSX version handling for dmd.link then.
If no one picks this up, I'm going to start downgrading all macOS pipelines to ignore failures, as this is blocking other work.
I think all that link.d needs is this code:
argv.push("-L-platform_version");
argv.push("-Lmacos");
argv.push("-L0.0");
to be added here https://github.com/dlang/dmd/blob/master/compiler/src/dmd/link.d#L610.
I just picked up the linker flags you used in ci/run.sh. I don't know what to do about -L${MACOSX_DEPLOYMENT_TARGET+10.9}.
I think all that link.d needs is this code:
argv.push("-L-platform_version"); argv.push("-Lmacos"); argv.push("-L0.0");to be added here https://github.com/dlang/dmd/blob/master/compiler/src/dmd/link.d#L610.
As far as I can tell, the syntax of the option is:
-platform_version <platform> <min_version> <sdk_version>
https://opensource.apple.com/source/ld64/ld64-530/doc/man/man1/ld.1.auto.html (Formatted version)
-platform_version platform min_version sdk_version
This is set to indicate the platform, oldest supported version of that platform that
output is to be used on, and the SDK that the output was built against. platform is a
numeric value as defined in <mach-o/loader.h>, or it may be one of the following
strings:
o macos
o ios
o tvos
o watchos
o bridgeos
o mac-catalyst
o ios-simulator
o tvos-simulator
o watchos-simulator
o driverkit
Specifying a newer min or SDK version enables the linker to assume features of that OS
or SDK in the output file. The format of min_version and sdk_version is a version number
such as 10.13 or 10.14
The sdk_version doesn't seem to be important, but the min_version would be rejected if invalid.
I just picked up the linker flags you used in
ci/run.sh. I don't know what to do about-L${MACOSX_DEPLOYMENT_TARGET+10.9}.
getenv and sscanf (the first commit in this PR is a simplified version of #10476 - https://github.com/dlang/dmd/pull/16178/commits/c01f8cc77e03be9b9fde4d8cb1e3b4ba886615da)
I wouldn't hard-code -Lplatform-version without first checking the version of ld64 (seems to be >= 512 - https://github.com/apple-opensource/ld64/commit/03dfd5524c89ea8e1d3a01c8a487934fb5433514#diff-eaf444f2d9662234ef8a047f73d6b25b7dd2dc39c3e426c6214b4702fec8287bR251) , as it's a new option, and older ld64 linkers won't understand it:
-macosx_version_min 10.9for all versions up to 10.14 inclusive.-platform_version macos 10.9 0.0for all versions from 10.15 onwards (DMD releases are still built on a macOS 10.13 box).
Hmm, so building the compiler itself with the host compiler is no problem at all, but compiling build.d is/was? WTF?
Hmm, so building the compiler itself with the host compiler is no problem at all, but compiling
build.dis/was? WTF?
I think CI_DFLAGS is used to build the host compiler. As for all the testsuite, my only guess is they aren't complex enough.
CI_DFLAGS is used to build the fresh compiler, but only set for FreeBSD. DMD itself is surely much much more complex than build.d, so when using the same host compiler to build both, I'd obviously expect at least the same issues for the actual compiler as for build.d.
Plus we have https://github.com/dlang/dmd/blob/6aa84abc575df70ed06b86629218b5ab9d70bf77/.github/workflows/main.yml#L85
which should AFAIK prevent using the latest Xcode features/regressions.
I think CI_DFLAGS is used to build the host compiler.
Oh of course it is; I just somehow totally missed that you extend them here in this PR. facepalm [The added tab-indentation doesn't really help.]
Plus we have https://github.com/dlang/dmd/blob/6aa84abc575df70ed06b86629218b5ab9d70bf77/.github/workflows/main.yml#L85
which should AFAIK prevent using the latest Xcode features/regressions.
Right, I can see a warning about both conflicting in the CI logs (-platform_version wins)
I could set this to 11 to see if it still fails.
Plus we have https://github.com/dlang/dmd/blob/6aa84abc575df70ed06b86629218b5ab9d70bf77/.github/workflows/main.yml#L85
which should AFAIK prevent using the latest Xcode features/regressions.
Right, I can see a warning about both conflicting in the CI logs (
-platform_versionwins)I could set this to
11to see if it still fails.
10.9 good, 11.0 fails
10.12 fails
10.11 fails
10.10 fails
10.9 second attempt fails.
Alright, looks like maybe -platform_version 10.9 passing CI the first time was a fluke.
-ld_classic gets us further, then fails later in the expected way.
Removing macOS as a required pipeline.
10.9 good, 11.0 fails […] 10.9 second attempt fails. […] Alright, looks like maybe -platform_version 10.9 passing CI the first time was a fluke.
They must have changed the runner base image; the CI job headers should include the image version somewhere.
The macOS-13 image history is tracked in https://github.com/actions/runner-images/commits/main/images/macos/macos-13-Readme.md; the last change was merged on February 14th (https://github.com/actions/runner-images/pull/9292), but the image version is 20240204.1 (February 4th). So I guess it was deployed in the middle of your attempts here...
So in that latest image bump, they changed the default Xcode version from v14 to v15.
Running sudo xcode-select -switch /Applications/Xcode_14.1.app (early) on macos-13 jobs might allow us to use the currently oldest installed Xcode version, hopefully without these 'nice' new Apple ld64 problems.
Edit: https://github.com/dlang/dmd/pull/16194
They must have changed the runner base image; the CI job headers should include the image version somewhere.
The macOS-13 image history is tracked in https://github.com/actions/runner-images/commits/main/images/macos/macos-13-Readme.md; the last change was merged on February 14th (actions/runner-images#9292), but the image version is
20240204.1(February 4th). So I guess it was deployed in the middle of your attempts here...
It was first seen to fail 2 weeks ago.
https://github.com/dlang/dmd/actions/runs/7792736255/job/21251287831
It's probably been a very slow rollout.
It was first seen to fail 2 weeks ago. https://github.com/dlang/dmd/actions/runs/7792736255/job/21251287831
Oh wow right, using that image already back then (February 6th):
Runner Image
Image: macos-13
Version: 20240204.1
So indeed, looks like the rollout is extremely slow, and starts way before the image readme.md is updated...
Github search to the rescue.
https://github.com/apple-oss-distributions/dyld/blob/d1a0f6869ece370913a3f749617e457f3b4cd7c4/mach_o/Image.cpp#L346-L347
if ( hasIndSymTab && (symCount != indSymCount))
return Error("symbol count from symbol table and dynamic symbol table differ");
- hasIndSymTab = LC_DYSYMTAB https://github.com/dlang/dmd/blob/9471b25db9ed44d71e0e27956430c0c6a09c16db/compiler/src/dmd/backend/machobj.d#L791-L795
- symCount = symbTabCmd->nsyms https://github.com/dlang/dmd/blob/9471b25db9ed44d71e0e27956430c0c6a09c16db/compiler/src/dmd/backend/machobj.d#L1297-L1299
- indSymCount = dySymTabCmd->iundefsym + dySymTabCmd->nundefsym https://github.com/dlang/dmd/blob/9471b25db9ed44d71e0e27956430c0c6a09c16db/compiler/src/dmd/backend/machobj.d#L1290-L1296
Simplifying the above for mere mortals (such as myself).
dysymtab_cmd.nlocalsym = local_symbuf.length;
dysymtab_cmd.iextdefsym = dysymtab_cmd.nlocalsym;
dysymtab_cmd.nextdefsym = public_symbuf.length;
dysymtab_cmd.iundefsym = dysymtab_cmd.iextdefsym + dysymtab_cmd.nextdefsym;
dysymtab_cmd.nundefsym = extern_symbuf.length + comdef_symbuf.length;
symtab_cmd.nsyms = dysymtab_cmd.nlocalsym +
dysymtab_cmd.nextdefsym +
dysymtab_cmd.nundefsym;
-> const prop
dysymtab_cmd.iundefsym = local_symbuf.length + public_symbuf.length;
dysymtab_cmd.nundefsym = extern_symbuf.length + comdef_symbuf.length;
symtab_cmd.nsyms = local_symbuf.length + public_symbuf.length + dysymtab_cmd.nundefsym;
So linker should see:
symCount = symtab_cmd.nsyms;
indSymCount = dysymtab_cmd.iundefsym + dysymtab_cmd.nundefsym;
Its a bit convoluted, but they add up to the same number. Which then means this might just be corruption of the data structure as the linker reads in the object.
I don't know entirely what's up with Apple's static linker, but I do know they wrote a new one recently, hence the -ld_classic flag. The error message that @ibuclaw found is from the dynamic linker (dyld). This error message is not available in any released source code of the static linker (ld64), as far as I can tell. Perhaps the static linker is now performing the same check that the dynamic linker already did, but I haven't not seen this error at load time.
Anyway, as @ibuclaw posted above, the number of symbols in the symbol table and dynamic symbol table need to be the same. A quick investigation shows this:
- Compile (without linking) the following code with LDC and DMD:
extern (C) int printf(const char*, ...);
void main()
{
printf("foo\n");
}
ldc2 -c -ofmain-ldc.o main.ddmd -c -ofmain-dmd.o main.d- Print the load commands of each object file
otool -l main-ldc.ootool -l main-dmd.o
This gives the following output:
DMD:
LC_SYMTAB: nsyms: 10
LC_DYSYMTAB: iundefsym: 6 nundefsym: 3
LDC:
LC_SYMTAB: nsyms: 12
LC_DYSYMTAB: iundefsym: 10 nundefsym: 2
LC_SYMTAB is the symbol table and LC_DYSYMTAB is the dynamic/indirect symbol table.
As mentioned above, the number of symbols are calculated as:
symCount = symtab_cmd.nsyms;
indSymCount = dysymtab_cmd.iundefsym + dysymtab_cmd.nundefsym;
According to otool the number of symbols in both symbols tables are the same for LDC, but for DMD it's not the same. LC_DYSYMTAB has one symbol less than LC_SYMTAB.
Also, trying to print the indirect symbol table give these results:
$ otool -I main-ldc.o
main-ldc.o:
$ otool -I main-dmd.o
main-dmd.o:
indirect symbol table offset is past end of file
So DMD is doing something wrong.
- Compile (without linking) the following code with LDC and DMD:
extern (C) int printf(const char*, ...); void main() { printf("foo\n"); }
ldc2 -c -ofmain-ldc.o main.ddmd -c -ofmain-dmd.o main.d- Print the load commands of each object file
otool -l main-ldc.ootool -l main-dmd.oThis gives the following output:
DMD:
LC_SYMTAB: nsyms: 10
Knowing those ten symbol names might help. But as far as I can tell, writing a wrong number to object is improbable.
This is ridiculous
https://github.com/dlang/dmd/blob/9471b25db9ed44d71e0e27956430c0c6a09c16db/compiler%2Fsrc%2Fdmd%2Fbackend%2Fmachobj.d#L1409-L1430
Knowing those ten symbol names might help. But as far as I can tell, writing a wrong number to object is improbable.
$ nm -a main-dmd.o
0000000000000050 s EH_frame0
0000000000000068 S _D main.eh
0000000000000038 S __D4main12__ModuleInfoZ
0000000000000000 T __Dmain
U __Dmain
U __d_run_main
U _main
0000000000000018 T _main
0000000000000090 S _main._d_cmain!().main.eh
U _printf
$ nm -a main-ldc.o
0000000000000070 S __D4main11__moduleRefZ
0000000000000060 D __D4main12__ModuleInfoZ
0000000000000000 T __Dmain
U __d_run_main
0000000000000020 T _main
U _printf
0000000000000054 s l_.str
0000000000000000 t ltmp0
0000000000000054 s ltmp1
0000000000000060 d ltmp2
0000000000000070 s ltmp3
0000000000000078 s ltmp4
The above shows all symbols. Unfortunately I haven't figured out which symbol table the nm command prints.
BTW, DMD can now cross-compile. LLVM contains the necessary tools to inspect the object files (llvm-otool and llvm-nm). So a Mac is not strictly needed. When otool prints the correct number of symbols, then a real Mac can be used to test to actually link. Here's an example:
$ docker run -it --rm --platform linux/amd64 ubuntu:22.04
root@99b0566a8b25:/# apt update && apt install -y curl xz-utils file
root@99b0566a8b25:/# curl -L -O --retry 3 https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.6/clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04.tar.xz
root@99b0566a8b25:~# curl -L -O --retry 3 https://downloads.dlang.org/releases/2.x/2.107.0/dmd.2.107.0.linux.tar.xz
root@99b0566a8b25:~# tar xf clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04.tar.xz
root@99b0566a8b25:~# tar xf dmd.2.107.0.linux.tar.xz
root@99b0566a8b25:~# ./dmd2/linux/bin64/dmd -target=x86_64-darwin main.d -c
root@99b0566a8b25:~# file main.o
main.o: Mach-O 64-bit x86_64 object, flags:<|SUBSECTIONS_VIA_SYMBOLS>
root@99b0566a8b25:~# ./clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin/llvm-otool -l main.o
root@99b0566a8b25:~# ./clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin/llvm-nm -a main.o
If you need access to a real Mac, you can also borrow a GitHub Action runner using this action: https://github.com/marketplace/actions/debugging-with-tmate. I've used it a lot to debug pipelines and my own action.
I count no fewer than half a dozen issues. https://issues.dlang.org/show_bug.cgi?id=23517 https://issues.dlang.org/show_bug.cgi?id=24137 https://issues.dlang.org/show_bug.cgi?id=24399 https://issues.dlang.org/show_bug.cgi?id=20297 https://issues.dlang.org/show_bug.cgi?id=24402 https://issues.dlang.org/show_bug.cgi?id=24401 https://issues.dlang.org/show_bug.cgi?id=22556
#16194 is in and there's no appetite to address them for now.