ocaml icon indicating copy to clipboard operation
ocaml copied to clipboard

Emancipate dynlink from compilerlibs

Open shindere opened this issue 2 years ago • 33 comments

As is well-known, currently the dynlink other library depends on compilerlibs. More precisely, it has to embed a copy of compilerlibs, meaning that compilerlibs has to be compiled twice, its dynlink-embedded version being compiled with -for-pack.

This PR, based on a suggestion by @stedolan, is a sequence of changes that emancipate dynlink from compilerlibs, making it depend on a smaller module, CamlinternalDynlink, which is added to the standard library.

Roughly speaking, the involved changes are the following ones:

  • Translating the structured constants into their final representation at compile-time rather than at runtime, to simplify the way they are stored in cmo files and make their stored representaiton more independent from the compilers' internals
  • Changing the way identifiers are stored in cmo files, to use only standard types rather than their representation as used by the type-checker
  • Putting the definition of magic numbers under configure's control, to make it possible and easy to use it at several places in the build system without dupplicating code
  • Roughly speaking, moving code around and sometimes splitting modules into two parts (e.g. the Dll module), one part used for compilation and static linkingthat remains in bytecomp and the other used for dynamic loadin gof objects by both dynlink and the toplevels, moved to the CamlinternalDynlink standard library module.

Each of the two first changes has required a compielr bootstrap. After these two changes, the types used to represent CMO and CMA files only use standard OCaml types and do not rely on any type internal to the compiler.

Currently, the PR works, meaning that itpasses the testsuite in both lambda and flambda modes.

On my machine, not compiling compilerlibs twice saves ~6s of total build time, which goes down from 4m58,261s to 4m52,251s. The size of dynlink.cma is reduced from 4,1M to 112K and the size of dynlink.cmxa is divided by two, going down from 8K to 4K (these numbers feel surprisingly small to me).

I didn't undertake any diff testing so far because I wanted to gather some input first and make sure the version I'd use for diff testing would be as close as possible to what peope would agree to merge, especially given that rebasing this on 5.0 is non-trivial and could introduce errors, in particular because of the changes to the bytepackager module that happened after 5.0 had been branched.

Among the aspects that will IMO require some discussion, there is the way the Symtable module has been split. Currently, much of its code is duplicated and it'd certainly be good to avoid such a duplication as much as possible, given that, as far as symbol table is concerned, it may not be possible to make the toplevel as independent of compilerlibs as dynlink because it more heavily relies on the Lambda module, thus oon compiler internals.

Obviously this PR is best approached in a per-commit way.

shindere avatar Feb 07 '23 10:02 shindere

the size of dynlink.cmxa is divided by two, going down from 8K to 4K (these numbers feel surprisingly small to me).

This file does not contain any code, you should look at the size of dynlink.a.

nojb avatar Feb 07 '23 10:02 nojb

Nicolás Ojeda Bär (2023/02/07 02:11 -0800):

the size of dynlink.cmxa is divided by two, going down from 8K to 4K (these numbers feel surprisingly small to me).

This file does not contain any code, you should look at the size of dynlink.a.

My bad, sorry! Then the decrease in size is from 4,3M to 128K.

shindere avatar Feb 07 '23 10:02 shindere

Translating the structured constants into their final representation at compile-time rather than at runtime, to simplify the way they are stored in cmo files and make their stored representaiton more independent from the compilers' internals.

This change sounds like something that would be very useful for camlboot. I would be interested in considering it as a separate buildup PR that could be discussed and reviewed separately.

gasche avatar Feb 07 '23 10:02 gasche

Of course, one possibility is to split this PR in several smaller / easier to chew ones. The aim of this PR is to provide a complete story, it does not necessarily mean it has to be merged all at once, assuming it's appreciated.

shindere avatar Feb 07 '23 10:02 shindere

  • Translating the structured constants into their final representation at compile-time rather than at runtime, to simplify the way they are stored in cmo files and make their stored representaiton more independent from the compilers' internals
  • Changing the way identifiers are stored in cmo files, to use only standard types rather than their representation as used by the type-checker

I think these two changes (at least) should be split off into separate PRs since they have independent interest and can be reviewed in a self-contained manner.

nojb avatar Feb 07 '23 10:02 nojb

Gabriel Scherer (2023/02/07 02:21 -0800):

Translating the structured constants into their final representation at compile-time rather than at runtime, to simplify the way they are stored in cmo files and make their stored representaiton more independent from the compilers' internals.

This change sounds like something that would be very useful for camlboot.

Oh! Nice if it has another positive outcome!

I would be interested in considering it as a separate buildup PR that could be discussed and reviewed separately.

Shall I go ahead and split that out right now or rather wait a bit?

One thing to keep in mind about this change is that it has an impact on ocamlobjinfo, more precisely on the way constants are displayed. Indeed, structured constants carry some type information which are used to display constants in a nice way. With this change, this type information is lost and the way constants are printed becomes slightly less friendly (well, in other words, untyped).

shindere avatar Feb 07 '23 10:02 shindere

What is the motivation for adding an internal dynlink module to the standard library? This move seems to go in the opposite direction of the rest of the PR. This is an internal module to the compiler that exposes internal implementation details of the compiler, ,which is not used by other modules in the standard library. Do we really want to make this module visible to all users of the standard library and start offering backward-compatibility guarantee for this module?

Octachron avatar Feb 07 '23 10:02 Octachron

Nicolás Ojeda Bär (2023/02/07 02:32 -0800):

  • Translating the structured constants into their final representation at compile-time rather than at runtime, to simplify the way they are stored in cmo files and make their stored representaiton more independent from the compilers' internals
  • Changing the way identifiers are stored in cmo files, to use only standard types rather than their representation as used by the type-checker

I think these two changes (at least) should be split off into separate PRs since they have independent interest and can be reviewed in a self-contained manner.

Okay, will do.

shindere avatar Feb 07 '23 10:02 shindere

Florian Angeletti (2023/02/07 02:34 -0800):

What is the motivation for adding an internal dynlink module to the standard library? This move seems to go in the opposite direction of the rest of the PR. This is an internal module to the compiler that exposes internal implementation details of the compiler, ,which is not used by other modules in the standard library. Do we really want to make this module visible to all users of the standard library and start offering backward-compatibility guarantee for this module?

Well, I think the high level goal was to have a place from which it's easy to share code.

It was part of Stephen's suggestion to add thisshared module to the standard library but it may be possible to add it to the compiler itself, I guess it would be enough to achieve the purpose.

shindere avatar Feb 07 '23 10:02 shindere

start offering backward-compatibility guarantee for this module?

I thought we didn't offer any guarantees at all for the camlinternal* modules... Their API is not even documented anywhere.

nojb avatar Feb 07 '23 10:02 nojb

uJust submitted #11997 for the translation of structured constants bit.

Since changing the representation of identifiers as done here touches code which is also changed by #11997, I plan to wait until we are done with #11997 and it (hopefully) gets merged before opening the PR about identifiers.

shindere avatar Feb 07 '23 12:02 shindere

@nojb , we don't offer support for the internal runtime API either, this still doesn't stop us from caring about eventual breakage there. Moreover, for a module with function sthat look useful in the standard library, there is a non-zero risk that people start to use it independently on how scary is the module name.

@shindere: if the purpose was just to make it is easier to build a proof-of-concept with the module in the standard library, this is perfectly fine. However, the module should really be moved to its own separate library beyond the proof-of-concept stage.

Octachron avatar Feb 07 '23 12:02 Octachron

Many thanks for all your feedback Hugo and for having taken the time to review and done so in such a prompt way.

All your suggestions make a lot of sense to me and I will make sure to integrate them as it goes and to give feedback for each of them (not all of them have been integratedyet).

hhugo (2023/02/07 03:17 -0800):

+(* Translate structured constants *) + +let rec transl_const = function

  • Const_base(Const_int i) -> Obj.repr i
  • | Const_base(Const_char c) -> Obj.repr c
  • | Const_base(Const_string (s, _, _)) -> Obj.repr s
  • | Const_base(Const_float f) -> Obj.repr (float_of_string f)
  • | Const_base(Const_int32 i) -> Obj.repr i
  • | Const_base(Const_int64 i) -> Obj.repr i
  • | Const_base(Const_nativeint i) -> Obj.repr i
  • | Const_immstring s -> Obj.repr s
  • | Const_block(tag, fields) ->
  •  let block = Obj.new_block tag (List.length fields) in
    
  •  let pos = ref 0 in
    
  •  List.iter
    

List.iteri ?

Yes, absolutely, you are right. Thanks. As you can see this has been integrated to #11997.

shindere avatar Feb 07 '23 12:02 shindere

Florian Angeletti (2023/02/07 04:59 -0800):

@shindere: if the purpose was just to make it is easier to build a proof-of-concept with the module in the standard library, this is perfectly fine. However, the module should really be moved to its own separate library beyond the proof-of-concept stage.

What I understand is that it's important that allthe clients use the very same symbol table. I think it is for this reasoon that Stephen has suggested a module in the standard library. I, personnally, am not attached to any particular solution so whatever will work and make people happy will be okay for me.

shindere avatar Feb 07 '23 13:02 shindere

Now that #11997 has been merged, the present PR has been rebased on top of it to keep track of the complete not-yet-merged emancipation of dynlink from compilerlibs history.

Simultaneously, #12031 has just been opened to propose a second bit of this history that may be worth being discussed and reviewed in isolation.

shindere avatar Feb 22 '23 20:02 shindere

I've had an occasion to discuss with @shindere on this subject, and for clarification I'll restate my position here: I'm 100% in favour of reducing the dependencies of Dynlink, but I think that as a library that manipulates artifacts produced by the compiler it will always naturally depend on (a subset of) compiler-libs, and we shouldn't try to hide that. If the only thing we want is to stop building the special copy of compiler-libs for dynlink, we could reconsider linking against compiler-libs normally. Most users should have namespaced their own modules correctly by now, so I think the risks of conflicts are low. And we could always consider namespacing compiler-libs ourselves in a more systematic way (like we did for the stdlib) if we still want to support users with conflicting modules.

lthls avatar Feb 24 '23 15:02 lthls

Many thanks for having taken the time to state your position publicly, Vincent.

I don't know how exactly we should proceed from here, whether e.g. we should organise a short dedicated online meeting, but what I can tell is that it would help if we could make a decision, something I can be reasonably sure we all agree on before continuing to invest time in the project at the risk of doing it in a direction which ends up not being the right one.

shindere avatar Feb 24 '23 15:02 shindere

I'd just like to register my disagreement with Vincent. The dependency on compiler-libs has always been a huge pain, and seems entirely unnecessary. I strongly support this PR.

lpw25 avatar Feb 24 '23 17:02 lpw25

Leo White (2023/02/24 09:21 -0800):

I'd just like to register my disagreement with Vincent. The dependency on compiler-libs has always been a huge pain, and seems entirely unnecessary. I strongly support this PR.

Many thanks, Leo.

The devil is in the details, so I am fully ocnvinced that some parts of the history as it currently is are ugly and will need to be improved, that's where I'd really appreciate any feedback I can receive.

shindere avatar Feb 24 '23 17:02 shindere

A few recent PRs have made the task addressed by this one significantly simpler: #12430 and #12360 that have already been merged and #12599 yet to be merged, to mention just the more important ones.

The present PR has thus been rebased on top of all the previously mentionned ones, leading to a much more straightforward sequence of changes.

In particular, compared to the original approach, the current one does not involve the addition of a module to the standard library and does not duplicate any code either.

The present version of the PR thus consists of four commits that are best reviewed one by one:

  1. Moving the definition of magic numbers to configure. This was already there in the original submission and remains useful to avoid having to update magic numbers in several places in the sources.
  2. Taking advantage of the previous commit, we introduce a dynlink-specific configuration module, Dynlink_config, that embeds the very few configuration values dynlink needs.
  3. Remove all the other dependencies from Dynlink to compilerlibs. This had to be done all at once, making this commit the biggest one of the PR. The definitions of file formats are copied to dynlink-specific modules but the copy is done by configure, which is how manual code duplication is avoided. Also, a minimal version of the Symtable module is provided, that embeds just the features needed by dynlink.
  4. Finally, the last commit of the PR cleans up the build system by stopping to build the dynlink_compilerlibs pack.

shindere avatar Oct 06 '23 07:10 shindere

  1. Moving the definition of magic numbers to configure. This was already there in the original submission and remains useful to avoid having to update magic numbers in several places in the sources.

This change makes me a bit nervous. How does this interact with bootstrapping? What is the workflow for changing magic numbers? I normally build a working compiler, then change the magic numbers and then bootstrap. Will we now need to rerun ./configure before bootstrapping?

nojb avatar Oct 06 '23 07:10 nojb

Nicolás Ojeda Bär (2023/10/06 00:47 -0700):

  1. Moving the definition of magic numbers to configure. This was already there in the original submission and remains useful to avoid having to update magic numbers in several places in the sources.

This change makes me a bit nervous. How does this interact with bootstrapping?

Oh, I have to confess I did not investigate this. One thing I can say is that I did make sure that the magic number of executables as known by the runtime is not the same than the one in configure, so I am expecting it should still be possible to bootstrap!

What is the workflow for changing magic numbers? I normally build a working compiler, then change the magic numbers and then bootstrap. Will we now need to rerun ./configure before bootstrapping?

Yes and that, if kept, will probably require a documentaiton update. The currently implemented workflow is that you modify the magic numbers in build-aux/ocaml_version.m4, then run tools/autogen and then ./configure.

In case that change is problematic, I am assuming the PR could leave it out if it is fine to maintain two versions of the magic numbers, of course at the risk of making mistakes and loosing the coherency between the two copies.

shindere avatar Oct 06 '23 07:10 shindere

Re:the handling of magic numbers, one other approach that could work might be to keep the move to configure but just document that, in the case of a bootstrap, they needto be changed at both several places during the bootstrap: in the m4 filealthough the change won't take effect immediately, and in the generated files for the bootstrap to work.

shindere avatar Oct 06 '23 08:10 shindere

Among the aspects that will IMO require some discussion, there is the way the Symtable module has been split. Currently, much of its code is duplicated and it'd certainly be good to avoid such a duplication as much as possible, given that, as far as symbol table is concerned, it may not be possible to make the toplevel as independent of compilerlibs as dynlink because it more heavily relies on the Lambda module, thus oon compiler internals.

The duplication also concerns Dll and Meta. Together with Symtable, they are responsible for interacting with the runtime (which both the toplevel and dynlink do).

Dll and Symtable are also used in the other context of linking bytecode.

Isolating the "interaction with the runtime" logic that both the toplevel and dynlink rely on from the rest could maybe help understanding how to later cut the dependencies on compiler-libs.

hhugo avatar Jan 15 '24 13:01 hhugo

After the merge of #12652, #12599 was the last missing prerequisite for the present PR. Now that #12599 has been merged, too, the present PR has been rebased on top of latest trunk.

It now consists in only 3 commits that should be easy to review one after the other. The two first commit remove the dependency of Dynlink on compilerlibs (the first one dealing specifically with the Config module whereas the second deals with all the remaining dependencies, which can't be dealt with separately. So, these two comits are the emancipation work in itself. The third commit finally gets rid of the no longer needed Dynlink_compilerlibs pack.

shindere avatar Jan 19 '24 19:01 shindere

As a follow-up to a private conversation with @dra27, I just pushed a revised version of this PR where the first commit includes an update to tools/bump-magic-numbers, so that this script also updates the magic numbers in the new Dynlink_config module.

Many thanks to @dra27 for having spotted this oversight of mine, due to the fact that the original PR predated the introduction of bump-magic-numbers.

shindere avatar Jan 22 '24 13:01 shindere

hhugo (2024/01/15 05:20 -0800):

 Among the aspects that will IMO require some discussion, there is
 the way the Symtable module has been split. Currently, much of its
 code is duplicated and it'd certainly be good to avoid such a
 duplication as much as possible, given that, as far as symbol table
 is concerned, it may not be possible to make the toplevel as
 independent of compilerlibs as dynlink because it more heavily
 relies on the Lambda module, thus oon compiler internals.

The duplication also concerns Dll and Meta. Together with Symtable, they are responsible for interacting with the runtime (which both the toplevel and dynlink do).

I think this problem is much less present with the current implementation.

It is possible that we could reduce code duplication even more, but I am not sure how much effort is worth putitng into that at this stage. Indeed, once this PR has been merged, one further improvement would be to move the Dynlink module to the standard library. If that happens, it will be possible to share some logic between Dynlink and the toplevel and it seems that it is at this stage that there will be the more opportunities to actually deduplicate code.

Dll and Symtable are also used in the other context of linking bytecode.

Isolating the "interaction with the runtime" logic that both the toplevel and dynlink rely on from the rest could maybe help understanding how to later cut the dependencies on compiler-libs.

I think the comment above addresses this.

shindere avatar Mar 18 '24 10:03 shindere

This has finally been rebased on latest trunk and the comments below have been taken into account:

Damien Doligez (2024/03/13 09:02 -0700):

@damiendoligez commented on this pull request.

I'm not giving a formal approval because I don't really understand all the fine details of dynlink, but this looks good to me.

@@ -0,0 +1,26 @@ +(* @configure_input@ ) +#3 "otherlibs/dynlink/dynlink_config.ml.in" +(**************************************************************************) +( ) +( OCaml ) +( ) +( Sebastien Hinderer, Tarides ) +( ) +( Copyright 2016 Institut National de Recherche en Informatique et *)

(*   Copyright 2024 Institut National de Recherche en Informatique et     *)

Fixed, thanks.

@@ -0,0 +1,304 @@ +() +(* ) +( OCaml ) +( ) +( Xavier Leroy, projet Cristal, INRIA Rocquencourt ) +( ) +( Copyright 1996 Institut National de Recherche en Informatique et ) +( en Automatique. ) +( ) +( All rights reserved. This file is distributed under the terms of ) +( the GNU Lesser General Public License version 2.1, with the ) +( special exception on linking described in the file LICENSE. ) +( *) +()

There is a non-trivial amount of duplicated code between this file and bytecomp/{dll,symtable}.ml. How much do they need to be synchronized? Shouldn't you add a comment (to this file and the originals) about it?

As I may have written earlier, it is indeed likely possible to shave the current code and reduce duplication even more. Not sure though there is interest in spedning time doing that at this stage, given that there is still the possibility to move Dynlink into the standard library, whch would open the way to real code deduplication, also with the toplevel.

  • (* Open a list of DLLs, adding them to opened_dlls.
  • Raise [Failure msg] in case of error. *)
    

This comment should be moved next to open_dlls.

Indeed, thanks. Done.

shindere avatar Apr 15 '24 12:04 shindere

Can we just press merge now, or is there something else we are waiting for?

lpw25 avatar Apr 26 '24 14:04 lpw25

How do we make sure changes to symtable and dll get propagated to dynlink ? Do we have enough tests to spot missing updates ? If no, can we improve testing in that area ?

hhugo avatar Apr 26 '24 18:04 hhugo