ocaml icon indicating copy to clipboard operation
ocaml copied to clipboard

Relocatable OCaml - Searching and Suffixing

Open dra27 opened this issue 5 months ago • 13 comments

This is the third of three PRs which implement Relocatable OCaml as proposed in ocaml/RFCs#53. Bytecode executables (including those produced when building the compiler distribution itself) usually contain an absolute path for the location of ocamlrun, which is incompatible with Relocatable OCaml. The patches here provide an alternate mechanism for these executables to find the interpreter without needing its absolute location. This change is combined with a name mangling scheme which is used both for the bytecode interpreter executables' filenames and holistically to fix long-standing issues with the naming of shared libraries (both the shared runtime libraries and shared bytecode C stub libraries). Together, the patches address section 2 of the RFC.

There are several mechanisms for linking bytecode executables. This PR is exclusively concerned with standalone bytecode executables, which are those where the compiled bytecode image is prefixed with a launcher, but not with the OCaml runtime interpreter itself. This launcher can be a simple "shebang" line (e.g. #!/path/to/ocamlrun) or a small executable, compiled from stdlib/header.c. In this case, "standalone" refers to the image being in a separate from the runtime, rather than that the executable itself is standalone. There are situations where the interpreter itself cannot be used in a shebang line, and in this case, the compiler today instead emits a tiny shell script using #!/bin/sh as the interpreter.

Windows does not support shebang executables, always using the executable stub. In order to assist the old binary distributions of Windows OCaml, the Windows version of the executable stub has always performed a PATH-search for ocamlrun. However, this was done at a time where it was expected that a user would have a single installation of a single version of OCaml on their system, which is no longer true. It is very much the case today that an ocamlrun in PATH has little to no guarantee of being the ocamlrun required by a given bytecode executable on a user's machine. Therefore, a name mangling scheme is also proposed - i.e. we increase the ways in which a bytecode executable may seek to find its runtime, but by refining the name of the file it's searching for, we increase the chances of finding the correct binary and of multiple installations of OCaml not interfering with each other. Fundamentally, this simply means that two different versions of OCaml (either a different release, or a relevantly different configuration) have different names for the bytecode interpreter. There are two crucial consequences to this: the error messages when things do go wrong are much better (being along the lines of "I can't find an interpreter for OCaml 5.5" rather than "bad magic number", "symbol not found" or just a segfault) and it also stops things from "silently working" and then suddenly failing one day because a release of OCaml happened to add a new function to the Unix library. This name mangling scheme is likewise applied to the shared libraries loaded by the interpreter.

The key changes are:

  • A new command line option for ocamlc, -launch-method, allows dynamic selection of either the shebang (#!/usr/bin/ocamlrun) or executable-stub launcher for standalone bytecode executables. This option allows the metadata in the runtime-launch-info file to be removed.
  • A new command line option for ocamlc, -runtime-search, allows a new mechanism to be specified for the header of standalone bytecode executables where instead of only executing ocamlrun from a fixed location, they are instead able to search for it.
  • A name mangling scheme is introduced to be used for shared libraries (both the shared library versions of the OCaml runtime and also for bytecode C stub shared libraries) and the bytecode interpreter executables (ocamlrun, etc.).
  • A new pair of command line options for ocamlmklib, -suffixed and -no-suffixed, and a new command line option allows the metadata in the runtime-launch-info file to be removed for ocamlc, -dllib-suffixed, provide a mechanism for using this name mangling scheme for bytecode C stub shared libraries. This mechanism is transparent to the user, for example #use "unix.cma" continues to work in the toplevel, but the interpreter executing the toplevel (i.e. ocamlrun) searches for a DLL based on its configuration.
  • A new configure option, --enable-suffixing, which is enabled by default uses this name mangling scheme for the bytecode interpreter executables and the shared library versions of the OCaml runtime. In particular, this means that the bin directory of two different OCaml compilers may appear in PATH and the lib/ocaml directory of two different OCaml runtimes in LD_LIBRARY_PATH but executables compiled for either of those versions of OCaml continue to load correctly.
  • A new pair of configure options, --enable-runtime-search and --enable-runtime-search-target, control how the bytecode executables of the compiler distribution and those produced by the compiler distribution respectively search for the runtime. In particular, --enable-runtime-search[=always] builds a compiler whose bytecode executables will continue to work correctly if the compiler is moved or copied to a new location after installation.

The commit series is in three phases:

Searching

The goal of the commits in this first phase of the series is to provide the ability to have the launcher not require the absolute location of the interpreter. Fundamentally, this involves extending both stdlib/header.c and the sh-script produced by bytecomp/bytelink.ml. As with -set-runtime-default in #14244, this is a facility which is needed during by some user executables (in particular, any bytecode executables installed in a "relocatable" opam switch) but not by others. This is a subtlety which I missed in the original implementation of this in 2021, which added the configuration to the runtime-launch-info file, providing the ability to build the compiler distribution with relocatable bytecode binaries (by setting stdlib/runtime-launch-info appropriately), but forcing executables produced by that compiler distribution either to be all relocatable or all not relocatable (by setting stdlib/target_runtime-launch-info). There is therefore a clear need for a command line option to select the search mode of the launcher.

Additionally, with most of the changes in this PR needing to be made equivalently between bytecomp/bytelink.ml (in POSIX Shell Command Language) and stdlib/header.c, it's desirable to be able to test executables produced with both the shebang launcher and executable launcher on the same system, but the only way to control this option is by changing runtime-launch-info file. It'd be just about acceptable to have to do that for the test harness, but it hints at the desirability for a command line option to select between the shebang launcher and executable launcher.

Having accepted that a command line option is needed to control the search mode of the launcher, it then seems strange to be encoding a default value for it in stdlib/target_runtime-launch-info, rather than in the Config module. Similarly, having accepted the addition of a command line option to select between shebang/executable for the launcher, it made me revisit the design in #12751 for runtime-launch-info. In addition to the launcher kind (shebang/executable), runtime-launch-info also contains the configured installation location of the binaries (which, for the installed runtime-launch-info file will also match Config.bindir in ocamlcommon) and the executable launcher itself. Given that both launcher kind and search mode are proposed to be conveyable by command line option, it seems to me to be sensible to add a command line mechanism to convey the location of the runtime interpreter executables to ocamlc and change runtime-launch-info to be just the compiled executable from stdlib/header.c, with the default values for launcher kind, search mode and binary directory residing where they belong in the Config module. This makes runtime-launch-info a cross-compilation concern only, and eliminates any difference between boot/ocamlc and ./ocamlc during the build. The only caveat is that when linking we must always be explicit about the launcher kind, Config.bindir and the search mode because boot/ocamlc cannot have defaults for these. I think using command line options this way is not only simpler than the #12751's use of runtime-launch-info but is semantically simpler than the camlheader files which it replaced. That change is therefore made as part of this commit series, but the explanation is here to motivate it why it has been done and also to make it clear that it's a necessary change to make, especially as the older implementation not doing it this way still exists. The underlying principle is that runtime-launch-info contains only things are which properties of the library around it and not things which can be altered when the compiler is invoked (i.e. stdlib/header.c has been compiled for a given configuration).

  • The first three commits are not strictly related, but fall in areas which are updated by this PR - somewhat hilariously, it turns out that #12751's logic for finding sh is incorrect on Solaris. An indentation error of a large part of a function in #14014 slipped through but, more importantly, the error reporting in the bytecode binaries test left something to be desired - especially given that the aforementioned Solaris problem triggered an failure in this test.
  • Next is a largely mechanical simplification to some logic in bytecomp/bytelink.ml. Previously, -use-runtime and -runtime-variant were processed first yielding a boolean use_runtime which indicated whether -use-runtime had been specified and a value for runtime. If -use-runtime was not specified and the compilation is not for Windows, then the runtime value is appended to the configured to the location for the interpreters specified in runtime-launch-info i.e. "ocamlrun" computed in the first check becomes "/usr/local/bin/ocamlrun" in the second step on Unix, but remains "ocamlrun" on Windows. However, if -use-runtime is specified, then the value is unaltered. This is all a bit obtuse (and I think probably my fault originally...), and it's much clearer to combine these.
  • Next, the build system is extended to support single quotes in --prefix. If someone can, they probably will (in this case, it was useful to test that the various de-quoting functions needed in the harness would not trip over single quotes in the values themselves).
  • ocamlobjinfo is extended to display information on the runtime of a standalone bytecode executable (the tools/ocamlsize Perl script already has this ability). Given the various changes with name mangling, this seems a very pertinent piece of information to be able to obtain. For the executable launcher, this is simply the content of the RNTM bytecode section. For the shebang launcher, it has to be read from the shebang or shell script. Code to do this is already present in the test harness from #14014, but because this parsing gets more complex with subsequent changes, it's instead rewritten as a lexer with the new function Byterntm.read_runtime.
  • -launch-method is added to ocamlc and then used in the in-prefix tests so that systems which support shebang scripts test both the executable launcher and the shebang launcher. The option takes a single parameter which exactly corresponds to the first line of runtime-launch-info.
  • -launch-method is then extended to support an additional part of the argument specifying the directory containing the binaries so that -launch-method 'sh /home/user/bin' simultaneously conveys both that a shebang launcher is to be used and that the interpreters reside in /home/user/bin. Once bootstrapped, -launch-method can now be used in boot/ocamlc to remove the need for the first two lines of boot/runtime-launch-info. Config.target_bindir and Config.launch_method contain the two lines otherwise added to stdlib/target_runtime-launch-info. The rest is plumbing, with the removal of all the parsing code for runtime-launch-info and a certain amount of temporary plumbing to cope with whether boot/ocamlc has been bootstrapped or not.
  • -runtime-search is then implemented, which provides three modes for locating the bytecode interpreter. disable is the existing behaviour, and requires the runtime to be located at the absolute location the compiler was configured with. always instead first looks in the directory containing the bytecode executable itself and then searches PATH, if necessary. In particular, binaries installed in an opam switch's bin directory always use the runtime in that switch's bin directory. Finally, enable provides a hybrid of both approaches where the bytecode executable looks for the runtime in the absolute location the compiler was configured with (the disable mode), then looks in the directory containing the bytecode executable, then searches in PATH (the always mode). The -launch-method and -runtime-search options together make it trivial to test all 6 combinations in the test harness. At this stage, Windows defaults to -runtime-search always where Unix defaults to -runtime-search disable, which just about corresponds to behaviour of the Windows executable launcher, with one slight improvement. Previously, the runtime (e.g. ocamlrun) was searched in Path using SearchPathW. The new behaviour first checks the directory containing the bytecode executable, which is a general usability improvement (this would have been useful, for example, when the change was originally made, as it would have removed the need to put C:\Program Files\OCaml\bin into Path, for example) and also brings the search behaviour more into line with LoadLibraryW(https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryw) which, when loading a DLL, first looks in the directory containing the executable.

Suffixing

With the compiler now able to search for its runtimes, the next commits introduce an approach to name mangling and use it to suffix the filenames of the bytecode interpreters, shared runtime libraries and shared bytecode stub libraries.

I have attempted to document the mangling scheme in-tree in runtime/Mangling.md. In summary:

  • We have various bits of information - each bit is either a property of the distribution (e.g. its version) or a property of its particular configuration (e.g. --enable-flambda, disable-flat-float-array, --without-zstd, etc.)
  • The configuration bits can affect a combination of the native runtime, the bytecode runtime and the bytecode interpreter
  • The interpreter executables are found using just the bits that affect reading that bytecode image (version, marshalled compression, int63 constants, shared libraries, etc.)
  • Each particular ocamlrun interpreter then loads .so based on the bits affecting the bytecode runtime - which includes bits which do not affect the execution of the bytecode itself. In particular, .so files are mangled with the machine triplet

The sequence:

  • The first commit simply sets up the infrastructure for these IDs.
  • The ID is then used to mangle the bytecode interpreter filenames, with --disable-suffixing added to configure to keep the existing behaviour. ocamlrun is now installed as xxxx-ocamlrun-bbbb where xxxx is the triplet that the runtime executes on and bbbb is the Bytecode Runtime ID. This executable is symlinked as ocamlrun and also as ocamlrun-zzzz where zzzz is the Zinc Runtime ID. When the tree is configured with --enable-suffixing, bytecomp/bytelink.ml uses ocamlrun-zzzz rather than ocamlrun when determining the runtime name. Note that -use-runtime is unaffected - if the runtime to use is explicitly stated then it overrides the name mangling. The information for the Zinc Runtime ID is split into two halves - the low bits, consisting of the release number and, intentionally, bits which are always zero, are universal and so come from Config (these are the first two characters). The high bits, which are configuration-specific, are put in runtime-launch-info (recall from the earlier stripping of information - this is information which cannot be dynamically changed when invoking the compiler) and merged by bytecomp/bytelink.ml.
  • libasmrun.so and libcamlrun.so then get the same treatment, becoming libasmrun-xxxx-bbbb.so and libcamlrun-xxxx-nnnn.so with a symlink created with the original unmangled name. The use of -runtime-variant for switching enabling the shared runtime has been something of a hack since it was originally added, but that's for another day - for now, both asmcomp/asmlink.ml and bytecomp/bytelink.ml recognise -runtime-variant "_shared" correctly mangle the name.
  • Finally, the scheme is extended to the bytecode C stub libraries so that dllunixbyt.so becomes dllunixbyt-xxxx-bbbb.so. The implementation is mostly mechanical. Although cma format is updated, note that the bootstrap can be delayed because boot/ocamlc is, by definition, only ever passed cma files which have lib_dllibs = [], so it doesn't matter that boot/ocamlc has the wrong "type", because it will never see a list.

Bootstrap and utilisation

The final phase of the commits allows the compiler to use all of these features. Each one of the introduced changes in the second phases notionally requires a bootstrap, but the commit series has been organised such that only a single bootstrap is required, with a series of ~~elegant workarounds~~gross hacks then removed in the following commit. Finally, the plumbing to implement --enable-runtime-search and --enable-runtime-search-target is available, along with the updates to the tests. Note that in the interests of sanity, but not for any particular implementation reasons, --enable-runtime-search and --enable-runtime-search-target both require --enable-suffixing (i.e. the escape hatch is there to allow all of this to be disabled, but it must then all be disabled).

dra27 avatar Sep 15 '25 00:09 dra27

Thanks, @shym! I'll endeavour to rebase this and then put the various fixups inline as extra commits to ease checking (I'll double-check whether the tabs-in-Makefiles is definitely what I'd had in mind as well!)

dra27 avatar Oct 22 '25 17:10 dra27

Rebased - review responses to follow

dra27 avatar Nov 09 '25 12:11 dra27

How's that looking, @shym? I still need to update the man pages and manual for -launch-method, -runtime-search and -dllib-suffixed, but I hope I've addressed everything else!

dra27 avatar Nov 10 '25 23:11 dra27

manpages and documentation now updated, too

dra27 avatar Nov 11 '25 10:11 dra27

Surfacing a side-channel discussion with @shym - I've updated the code around -runtime-search to use constructor names consistent with the present options accepted by the flag, which is disable, enable, always (so "runtime searching is disabled, runtime searching is enabled, runtime searching is always performed"), but we think it is clearer to go with the suggestion in the thread above: disable, fallback, enable, and I'll update the commits to reflect this.

dra27 avatar Nov 12 '25 09:11 dra27

(rebased prior to addressing @damiendoligez's review comments)

dra27 avatar Nov 30 '25 18:11 dra27

I messed up the renaming of the options for -runtime-search - pushed back to just having addressed all the other review comments, and I shall re-do that change a bit more carefully this time (well done CI...)

The changes made so far can be seen in here.

dra27 avatar Dec 03 '25 15:12 dra27

Rebased to the same base as #14244

dra27 avatar Dec 10 '25 10:12 dra27

Rebased on to #14244, with various consequential updates (and current review commits squashed).

Renaming of the -runtime-search options still to go.

dra27 avatar Dec 10 '25 17:12 dra27

-runtime-search options are now disable (as before, which is the default behaviour via --disable-runtime-search), fallback (which was enable, now activated by default with --enable-runtime-search=fallback), and enable (which was always, now activated by default with --enable-runtime-search)

dra27 avatar Dec 11 '25 17:12 dra27

@shym, @damiendoligez - I believe all points are now addressed, and the rename is complete. Once this run has passed CI, I'll rebase on to trunk so that it can have a full run on precheck with with #14409.

dra27 avatar Dec 11 '25 17:12 dra27

cf. the new commit https://github.com/ocaml/ocaml/pull/14245/commits/dd4a6190160c63c0f0f4533c81005148d51a0c0b added to the end of the series 🥳

dra27 avatar Dec 11 '25 17:12 dra27

Hopefully final rebase. Thank you @shym and @damiendoligez for the reviewing for this one as well! It's going through precheck#1089 and, assuming nothing gets thrown up by any of the final CI checks, let's be relocatable...

dra27 avatar Dec 11 '25 18:12 dra27

Merged. Congratulations @dra27!

nojb avatar Dec 12 '25 04:12 nojb