flexdll icon indicating copy to clipboard operation
flexdll copied to clipboard

Support relocation kind 0003 and extend IMAGE_REL_ types

Open jonahbeckford opened this issue 1 year ago • 1 comments

Fixes #29 .

Specifically adds support for:

  • IMAGE_REL_AMD64_ADDR32NB. (This is the "relocation kind 0003").

  • IMAGE_REL_I386_DIR32NB. (This is the x86 version of the above).

  • IMAGE_REL_AMD64_REL32_5. (This is documented but weirdly not implemented).

Tested with my ucrt branch https://github.com/jonahbeckford/flexdll/tree/0.43%2Bucrt where relocation kind 0003 occurs often. It especially occurs in "normal" C libraries (ucvrt; /MD) that are linked into an OCaml executable. None of the tested code used /GS-.

References:

  • https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#coff-relocations-object-only
  • https://github.com/llvm/llvm-project/blob/279953f1da7d7d0e404638414deec5d1df291c7c/lld/COFF/Chunks.cpp#L113-L158

jonahbeckford avatar Jul 29 '24 02:07 jonahbeckford

I just noticed that I have not extended the DLL logic at https://github.com/ocaml/flexdll/blob/80496b51b9be9e128c6144377d20ccf6eff10129/create_dll.ml#L235-L253

That DLL logic does use the image_base for IMAGE_REL_AMD64_ADDR64 so at least in that code the IMAGE_REL_AMD64_ADDR64 math makes sense.

I don't really have a way to test that. The stubs/plugins generated by OCaml do not have those relocations (which is also why I didn't notice them).

jonahbeckford avatar Jul 29 '24 07:07 jonahbeckford

FIXED. This is an independent problem.

~This PR needs to stay open~ ... I am using this PR much more extensively and I'm encountering:

Fatal error: cannot load shared library dllthreads
Reason: flexdll error: cannot relocate caml_stat_free RELOC_REL32, target is too far: FFFFFFFF43282860  0000000043282860
Fatal error: cannot load shared library dllcurl_stubs
Reason: flexdll error: cannot relocate Caml_state RELOC_REL32, target is too far: FFFFFFFF2DAA3206  000000002DAA3206

etc.

@MisterDA: Looks like casting of int64/uint64 to uint32 is not done correctly ... I think it was just luck before if it was working. Your Avoid conversion from 'INT_PTR' to 'UINT32' warnings review suggestion is very close to where I'm seeing the failures, so it sounds like the C compiler was giving us a good warning. If you know how to make those INT_PTR/INT32/UINT32 operations robust, reply back! I'll investigate more in the meantime.

EDIT: After looking at old emails, it sounds like the checks are to make sure that the relocations are within 2GB, which presumes all the DLLs are within 2GB (here and here). And that begs the question what magic mechanism is enforcing that 2GB bin-packing! I wonder if there is a MSVC option to force all these problematic address references to be 64-bit on 64-bit machines.

EDIT 2: Changing CAMLextern to be __declspec(dllimport) extern on MSVC (https://github.com/diskuv/dkml-compiler/compare/b211c65d1ac69590ce75e9ae8405fb71a3654f63...main) forces the address references to be 64-bit. And there will be packages like https://github.com/mirage/digestif/pull/157 that will need adjustments if they don't use CAMLextern.

jonahbeckford avatar Nov 26 '24 06:11 jonahbeckford

Indeed, the docs in flexdll are out of date w.r.t. __declspec(dllimport). FlexDLL is showing its age, in that when it was first written Windows was still overwhelmingly 32-bit! Originally, the magic was that by specifying a base address for the DLL by default simply caused the 64-bit to load the DLLs close - there was never a guarantee, it just worked reliably for quite some time.

Except that we lost Cygwin64 support in 2018 with the release of Cygwin's rebase 4.4.4 (see this thread for the explanation of why that change was made). That killed shared library / natdynlink support for Cygwin until 4.12 (and https://github.com/ocaml/ocaml/pull/9927). That's when the __declspec(dllimport) mechanism was restored (and is when the README ought to have been updated).

IIRC mingw-w64 had never clamped the base address because it hadn't been necessary, but that started to become necessary with binutils 2.36 - at which point (https://github.com/ocaml/ocaml/pull/10351) co-opted the same mechanism for mingw-w64. Hwoever, msvc64 still uses the base address trick.

dra27 avatar Nov 28 '24 11:11 dra27

Are those failures with msvc64 or mingw-w64, @jonahbeckford? IIRC opam-repository-mingw "fixed" the binutils thing by forcing the base address for mingw-w64 instead, which means that this will be becoming a gradually bigger problem, requiring more uses of CAMLextern in libraries, indeed.

The systhreads failure is surprising, as I thought that was only in 5.1+ (and is fixed in 5.2.1 and 5.3.0) - which version of OCaml is that one with?

dra27 avatar Nov 28 '24 11:11 dra27

Are those failures with msvc64 or mingw-w64, @jonahbeckford? IIRC opam-repository-mingw "fixed" the binutils thing by forcing the base address for mingw-w64 instead, which means that this will be becoming a gradually bigger problem, requiring more uses of CAMLextern in libraries, indeed.

msvc64

The systhreads failure is surprising, as I thought that was only in 5.1+ (and is fixed in 5.2.1 and 5.3.0) - which version of OCaml is that one with?

4.14.2 with the nine (9) ocaml-common-4_14-a* patches at https://github.com/diskuv/dkml-compiler/tree/main/src/p and two (2) non-DkML patches (see below) using the ucrt branch of flexdll (ucrt link in this PR's main comment).

ocaml-common-4_14-z02: This is relevant to systhreads. https://gist.github.com/jonahbeckford/6cdbfb559b3f2c0b4eb143210bed0d3f. Of course, that is orthogonal to this PR but I'm curious what needed to be fixed with it. (I only backport a change when I see a problem, but I generally don't monitor the OCaml 5 changelog. I was expecting bugfixes to be backported to the 4.14 "LTS" version but that hasn't been happening.)

jonahbeckford avatar Nov 28 '24 17:11 jonahbeckford

.... FlexDLL is showing its age

Thanks for the context! I feel at some point I might explore getting rid of flexdll entirely. There is a lot of complexity in flexdll for a a) use case centered around resolving circular references for plugins which I think was not a great design, and b) to remove the need for __declspec(dll{im,ex}port) which is no longer true today for 64-bit. Combine that with the sad state of flexdll security like missing ASLR and the (huge) effort I needed to remove /GS- from https://github.com/ocaml/flexdll/blob/a003c18c563ec661d29a304c0b7fb61910ac88d8/Makefile#L65

Basically, I think today there is little benefit and a lot of drawbacks where 10-20 years ago the balance was a lot more positive. But OCaml is deeply tied to the circular reference model for stubs and also (maybe related) deeply tied to a flat symbol namespace ... the two assumptions from ELF linking where DLL and dylib linking have better designs ... so I'm under no illusion that untangling the assumptions would be easy.

jonahbeckford avatar Nov 28 '24 21:11 jonahbeckford