dmd icon indicating copy to clipboard operation
dmd copied to clipboard

Fix macOS ld: multiple errors: symbol count from symbol table and dynamic symbol table differ

Open ibuclaw opened this issue 1 year ago • 24 comments

Encode macosx_version_min or build_version into the object file, originally authored by @jacob-carlborg in #10476.

This has been simplified to omit parsing the SDK version. Based on what I see GCC is doing (-platform_version macos $version_min 0.0), this information is not required in order for things to work.


Either this will fix the new ld errors, or we'll have to start adding -L-ld_classic to the linker command.

https://forum.dlang.org/thread/[email protected]

ibuclaw avatar Feb 11 '24 21:02 ibuclaw

Thanks for your pull request, @ibuclaw!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#16178"

dlang-bot avatar Feb 11 '24 21:02 dlang-bot

ld: load command #3 extends beyond the end of the load commands file '../generated/osx/release/64/lexer.o' for architecture x86_64

ibuclaw avatar Feb 11 '24 21:02 ibuclaw

It looks like encoding this information into the binary isn't working, but passing it via linker flags works just fine. Someone (not me) will have to repurpose the OSX version handling for dmd.link then.

ibuclaw avatar Feb 11 '24 22:02 ibuclaw

If no one picks this up, I'm going to start downgrading all macOS pipelines to ignore failures, as this is blocking other work.

ibuclaw avatar Feb 11 '24 22:02 ibuclaw

I think all that link.d needs is this code:

argv.push("-L-platform_version");
argv.push("-Lmacos");
argv.push("-L0.0");

to be added here https://github.com/dlang/dmd/blob/master/compiler/src/dmd/link.d#L610.

I just picked up the linker flags you used in ci/run.sh. I don't know what to do about -L${MACOSX_DEPLOYMENT_TARGET+10.9}.

RazvanN7 avatar Feb 12 '24 11:02 RazvanN7

I think all that link.d needs is this code:

argv.push("-L-platform_version");
argv.push("-Lmacos");
argv.push("-L0.0");

to be added here https://github.com/dlang/dmd/blob/master/compiler/src/dmd/link.d#L610.

As far as I can tell, the syntax of the option is:

-platform_version <platform> <min_version> <sdk_version>

https://opensource.apple.com/source/ld64/ld64-530/doc/man/man1/ld.1.auto.html (Formatted version)

-platform_version platform min_version sdk_version
            This is set to indicate the platform, oldest supported version of that platform that
            output is to be used on, and the SDK that the output was built against.  platform is a
            numeric value as defined in <mach-o/loader.h>, or it may be one of the following
            strings:
            o macos
            o ios
            o tvos
            o watchos
            o bridgeos
            o mac-catalyst
            o ios-simulator
            o tvos-simulator
            o watchos-simulator
            o driverkit
            Specifying a newer min or SDK version enables the linker to assume features of that OS
            or SDK in the output file. The format of min_version and sdk_version is a version number
            such as 10.13 or 10.14

The sdk_version doesn't seem to be important, but the min_version would be rejected if invalid.


I just picked up the linker flags you used in ci/run.sh. I don't know what to do about -L${MACOSX_DEPLOYMENT_TARGET+10.9}.

getenv and sscanf (the first commit in this PR is a simplified version of #10476 - https://github.com/dlang/dmd/pull/16178/commits/c01f8cc77e03be9b9fde4d8cb1e3b4ba886615da)

I wouldn't hard-code -Lplatform-version without first checking the version of ld64 (seems to be >= 512 - https://github.com/apple-opensource/ld64/commit/03dfd5524c89ea8e1d3a01c8a487934fb5433514#diff-eaf444f2d9662234ef8a047f73d6b25b7dd2dc39c3e426c6214b4702fec8287bR251) , as it's a new option, and older ld64 linkers won't understand it:

  • -macosx_version_min 10.9 for all versions up to 10.14 inclusive.
  • -platform_version macos 10.9 0.0 for all versions from 10.15 onwards (DMD releases are still built on a macOS 10.13 box).

ibuclaw avatar Feb 12 '24 13:02 ibuclaw

Hmm, so building the compiler itself with the host compiler is no problem at all, but compiling build.d is/was? WTF?

kinke avatar Feb 12 '24 14:02 kinke

Hmm, so building the compiler itself with the host compiler is no problem at all, but compiling build.d is/was? WTF?

I think CI_DFLAGS is used to build the host compiler. As for all the testsuite, my only guess is they aren't complex enough.

ibuclaw avatar Feb 12 '24 16:02 ibuclaw

CI_DFLAGS is used to build the fresh compiler, but only set for FreeBSD. DMD itself is surely much much more complex than build.d, so when using the same host compiler to build both, I'd obviously expect at least the same issues for the actual compiler as for build.d.

kinke avatar Feb 12 '24 18:02 kinke

Plus we have https://github.com/dlang/dmd/blob/6aa84abc575df70ed06b86629218b5ab9d70bf77/.github/workflows/main.yml#L85

which should AFAIK prevent using the latest Xcode features/regressions.

kinke avatar Feb 12 '24 18:02 kinke

I think CI_DFLAGS is used to build the host compiler.

Oh of course it is; I just somehow totally missed that you extend them here in this PR. facepalm [The added tab-indentation doesn't really help.]

kinke avatar Feb 14 '24 23:02 kinke

Plus we have https://github.com/dlang/dmd/blob/6aa84abc575df70ed06b86629218b5ab9d70bf77/.github/workflows/main.yml#L85

which should AFAIK prevent using the latest Xcode features/regressions.

Right, I can see a warning about both conflicting in the CI logs (-platform_version wins)

I could set this to 11 to see if it still fails.

ibuclaw avatar Feb 15 '24 00:02 ibuclaw

Plus we have https://github.com/dlang/dmd/blob/6aa84abc575df70ed06b86629218b5ab9d70bf77/.github/workflows/main.yml#L85

which should AFAIK prevent using the latest Xcode features/regressions.

Right, I can see a warning about both conflicting in the CI logs (-platform_version wins)

I could set this to 11 to see if it still fails.

10.9 good, 11.0 fails

10.12 fails

10.11 fails

10.10 fails

10.9 second attempt fails.

ibuclaw avatar Feb 15 '24 01:02 ibuclaw

Alright, looks like maybe -platform_version 10.9 passing CI the first time was a fluke.

ibuclaw avatar Feb 15 '24 08:02 ibuclaw

-ld_classic gets us further, then fails later in the expected way.

ibuclaw avatar Feb 15 '24 09:02 ibuclaw

Removing macOS as a required pipeline.

ibuclaw avatar Feb 15 '24 09:02 ibuclaw

10.9 good, 11.0 fails […] 10.9 second attempt fails. […] Alright, looks like maybe -platform_version 10.9 passing CI the first time was a fluke.

They must have changed the runner base image; the CI job headers should include the image version somewhere.

The macOS-13 image history is tracked in https://github.com/actions/runner-images/commits/main/images/macos/macos-13-Readme.md; the last change was merged on February 14th (https://github.com/actions/runner-images/pull/9292), but the image version is 20240204.1 (February 4th). So I guess it was deployed in the middle of your attempts here...

kinke avatar Feb 16 '24 19:02 kinke

So in that latest image bump, they changed the default Xcode version from v14 to v15.

Running sudo xcode-select -switch /Applications/Xcode_14.1.app (early) on macos-13 jobs might allow us to use the currently oldest installed Xcode version, hopefully without these 'nice' new Apple ld64 problems.

Edit: https://github.com/dlang/dmd/pull/16194

kinke avatar Feb 16 '24 19:02 kinke

They must have changed the runner base image; the CI job headers should include the image version somewhere.

The macOS-13 image history is tracked in https://github.com/actions/runner-images/commits/main/images/macos/macos-13-Readme.md; the last change was merged on February 14th (actions/runner-images#9292), but the image version is 20240204.1 (February 4th). So I guess it was deployed in the middle of your attempts here...

It was first seen to fail 2 weeks ago.

https://github.com/dlang/dmd/actions/runs/7792736255/job/21251287831

It's probably been a very slow rollout.

ibuclaw avatar Feb 16 '24 19:02 ibuclaw

It was first seen to fail 2 weeks ago. https://github.com/dlang/dmd/actions/runs/7792736255/job/21251287831

Oh wow right, using that image already back then (February 6th):

 Runner Image
  Image: macos-13
  Version: 20240204.1

So indeed, looks like the rollout is extremely slow, and starts way before the image readme.md is updated...

kinke avatar Feb 16 '24 19:02 kinke

Github search to the rescue.

https://github.com/apple-oss-distributions/dyld/blob/d1a0f6869ece370913a3f749617e457f3b4cd7c4/mach_o/Image.cpp#L346-L347

    if ( hasIndSymTab && (symCount != indSymCount))
        return Error("symbol count from symbol table and dynamic symbol table differ");

Simplifying the above for mere mortals (such as myself).

 dysymtab_cmd.nlocalsym  = local_symbuf.length;
 dysymtab_cmd.iextdefsym = dysymtab_cmd.nlocalsym;
 dysymtab_cmd.nextdefsym = public_symbuf.length;
 dysymtab_cmd.iundefsym = dysymtab_cmd.iextdefsym + dysymtab_cmd.nextdefsym; 
 dysymtab_cmd.nundefsym  = extern_symbuf.length + comdef_symbuf.length;
 symtab_cmd.nsyms =  dysymtab_cmd.nlocalsym + 
                     dysymtab_cmd.nextdefsym + 
                     dysymtab_cmd.nundefsym; 

-> const prop

 dysymtab_cmd.iundefsym = local_symbuf.length + public_symbuf.length; 
 dysymtab_cmd.nundefsym  = extern_symbuf.length + comdef_symbuf.length;
 symtab_cmd.nsyms =  local_symbuf.length + public_symbuf.length + dysymtab_cmd.nundefsym; 

So linker should see:

symCount = symtab_cmd.nsyms;
indSymCount = dysymtab_cmd.iundefsym + dysymtab_cmd.nundefsym;

Its a bit convoluted, but they add up to the same number. Which then means this might just be corruption of the data structure as the linker reads in the object.

ibuclaw avatar Feb 16 '24 23:02 ibuclaw

I don't know entirely what's up with Apple's static linker, but I do know they wrote a new one recently, hence the -ld_classic flag. The error message that @ibuclaw found is from the dynamic linker (dyld). This error message is not available in any released source code of the static linker (ld64), as far as I can tell. Perhaps the static linker is now performing the same check that the dynamic linker already did, but I haven't not seen this error at load time.

Anyway, as @ibuclaw posted above, the number of symbols in the symbol table and dynamic symbol table need to be the same. A quick investigation shows this:

  1. Compile (without linking) the following code with LDC and DMD:
extern (C) int printf(const char*, ...);

void main()
{
    printf("foo\n");
}
  1. ldc2 -c -ofmain-ldc.o main.d
  2. dmd -c -ofmain-dmd.o main.d
  3. Print the load commands of each object file
  4. otool -l main-ldc.o
  5. otool -l main-dmd.o

This gives the following output:

DMD:

LC_SYMTAB: nsyms: 10

LC_DYSYMTAB: iundefsym: 6 nundefsym: 3

LDC:

LC_SYMTAB: nsyms: 12

LC_DYSYMTAB: iundefsym: 10 nundefsym: 2

LC_SYMTAB is the symbol table and LC_DYSYMTAB is the dynamic/indirect symbol table.

As mentioned above, the number of symbols are calculated as:

symCount = symtab_cmd.nsyms;
indSymCount = dysymtab_cmd.iundefsym + dysymtab_cmd.nundefsym;

According to otool the number of symbols in both symbols tables are the same for LDC, but for DMD it's not the same. LC_DYSYMTAB has one symbol less than LC_SYMTAB.

Also, trying to print the indirect symbol table give these results:

$ otool -I main-ldc.o
main-ldc.o:
$ otool -I main-dmd.o
main-dmd.o:
indirect symbol table offset is past end of file

So DMD is doing something wrong.

jacob-carlborg avatar Feb 18 '24 20:02 jacob-carlborg

  1. Compile (without linking) the following code with LDC and DMD:
extern (C) int printf(const char*, ...);

void main()
{
    printf("foo\n");
}
  1. ldc2 -c -ofmain-ldc.o main.d
  2. dmd -c -ofmain-dmd.o main.d
  3. Print the load commands of each object file
  4. otool -l main-ldc.o
  5. otool -l main-dmd.o

This gives the following output:

DMD:

LC_SYMTAB: nsyms: 10

Knowing those ten symbol names might help. But as far as I can tell, writing a wrong number to object is improbable.

ibuclaw avatar Feb 18 '24 21:02 ibuclaw

This is ridiculous

https://github.com/dlang/dmd/blob/9471b25db9ed44d71e0e27956430c0c6a09c16db/compiler%2Fsrc%2Fdmd%2Fbackend%2Fmachobj.d#L1409-L1430

ibuclaw avatar Feb 18 '24 21:02 ibuclaw

Knowing those ten symbol names might help. But as far as I can tell, writing a wrong number to object is improbable.

$ nm -a main-dmd.o
0000000000000050 s EH_frame0
0000000000000068 S _D main.eh
0000000000000038 S __D4main12__ModuleInfoZ
0000000000000000 T __Dmain
                 U __Dmain
                 U __d_run_main
                 U _main
0000000000000018 T _main
0000000000000090 S _main._d_cmain!().main.eh
                 U _printf
$ nm -a main-ldc.o
0000000000000070 S __D4main11__moduleRefZ
0000000000000060 D __D4main12__ModuleInfoZ
0000000000000000 T __Dmain
                 U __d_run_main
0000000000000020 T _main
                 U _printf
0000000000000054 s l_.str
0000000000000000 t ltmp0
0000000000000054 s ltmp1
0000000000000060 d ltmp2
0000000000000070 s ltmp3
0000000000000078 s ltmp4

The above shows all symbols. Unfortunately I haven't figured out which symbol table the nm command prints.

jacob-carlborg avatar Feb 19 '24 08:02 jacob-carlborg

BTW, DMD can now cross-compile. LLVM contains the necessary tools to inspect the object files (llvm-otool and llvm-nm). So a Mac is not strictly needed. When otool prints the correct number of symbols, then a real Mac can be used to test to actually link. Here's an example:

$ docker run -it --rm --platform linux/amd64 ubuntu:22.04
root@99b0566a8b25:/# apt update && apt install -y curl xz-utils file
root@99b0566a8b25:/# curl -L -O --retry 3 https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.6/clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04.tar.xz
root@99b0566a8b25:~# curl -L -O --retry 3 https://downloads.dlang.org/releases/2.x/2.107.0/dmd.2.107.0.linux.tar.xz
root@99b0566a8b25:~# tar xf clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04.tar.xz
root@99b0566a8b25:~# tar xf dmd.2.107.0.linux.tar.xz
root@99b0566a8b25:~# ./dmd2/linux/bin64/dmd -target=x86_64-darwin main.d -c
root@99b0566a8b25:~# file main.o
main.o: Mach-O 64-bit x86_64 object, flags:<|SUBSECTIONS_VIA_SYMBOLS>
root@99b0566a8b25:~# ./clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin/llvm-otool -l main.o
root@99b0566a8b25:~# ./clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin/llvm-nm -a main.o

If you need access to a real Mac, you can also borrow a GitHub Action runner using this action: https://github.com/marketplace/actions/debugging-with-tmate. I've used it a lot to debug pipelines and my own action.

jacob-carlborg avatar Feb 19 '24 09:02 jacob-carlborg

I count no fewer than half a dozen issues. https://issues.dlang.org/show_bug.cgi?id=23517 https://issues.dlang.org/show_bug.cgi?id=24137 https://issues.dlang.org/show_bug.cgi?id=24399 https://issues.dlang.org/show_bug.cgi?id=20297 https://issues.dlang.org/show_bug.cgi?id=24402 https://issues.dlang.org/show_bug.cgi?id=24401 https://issues.dlang.org/show_bug.cgi?id=22556

#16194 is in and there's no appetite to address them for now.

ibuclaw avatar Feb 20 '24 23:02 ibuclaw