RFCs icon indicating copy to clipboard operation
RFCs copied to clipboard

Fast native toplevel using JIT

Open ghost opened this issue 5 years ago • 26 comments
trafficstars

Fast native toplevel using JIT

Overview

We (Jane Street + OCL/Tarides) would like to make the native toplevel faster and more self-contained.

At the moment, the native toplevel works by calling the assembler and linker for each phrase. This makes it slow and dependent on an external toolchain which is not great for deployment.

To reach this goal, we would simply like to bring this work up to date and merge it in the compiler.

Motivation

This work would provide a simple way to compile and execute OCaml code at runtime, which would unlock a lot of new possibilities to develop great new tools.

Coupled with the fact that we can already embed cmi files into an executable, this work would make it possible to distribute a self-contained binary that can evaluate OCaml code at runtime. This would make it simple and straightforward to use OCaml as an extension language.

Verified examples in documentation comment

We are particularly interested in this feature for the mdx tool. More precisely, we are currently working on a feature allowing verified toplevel snippets in mli files. For instance:

(** [chop_prefix s ~prefix] returns [s] without the leading [prefix]. *)

    {[
      # chop_prefix "abc" ~prefix:"a";
      - : string option = Some "bc"
      # chop_prefix "abc" ~prefix:"x";
      - : string option = None
    ]}
*)
val chop_prefix : string -> prefix:string -> string

In the above example, the {[ ... ]} would be kept up to date by mdx to ensure that the document stays in sync when the code changes. In fact, the user would initially only write the # lines and mdx would insert the results just as with expectation tests.

The change in detail

This change would add JIT code generation for x86 architectures as described in the paper. For other architectures, we would still rely on the portable method of calling the assembler and linker. The main additions to the compiler code base would be:

  • some code in the backend to do the assembly in process
  • a few more C functions to glue things together

The paper mentions that it adds 2300 lines of OCaml+C code to the compiler code base.

One detail to mention: IIUC the JIT ocamlnat from the paper goes directly from the linear form to binary assembly. Now that we have a symbolic representation of the assembly we could also go from the symbolic assembly in order to share more logic between normal compilation and JIT.

We discussed with @alainfrisch and @nobj since LexiFi has been using an in-memory assembler in production for a while. They mentioned that they would be happy to open-source the code if they can, which means that we could be using code that has been running in production for a long time and is likely to be well tested and correct.

LexiFi's binary emitter is about 1800 lines of code including comments and newlines. This looks a bit smaller than the JIT part of the JIT ocamlnat, so we would still be adding approximately the same amount of code if we went this way.

Drawback

This is one more feature to maintain in the compiler and it comes with a non-negligible amount of code. However, and especially if we can reuse LexiFi in-memory assembler, most of the additions would come from well tested code. @alainfrisch and @nobj also mentioned that this code was very low-maintenance and had pretty much not changed in 5 years.

Alternatives

For the mdx case, we considered a few alternatives.

Using a bytecode toplevel

Mdx currently uses a bytecode toplevel where everything is compiled and executed in byte-code. This includes:

  • code coming from user libraries
  • the full compilation of the toplevel phrases

as a result, mdx is currently very slow and the round-trip time between the user saving a file and seeing the result easily climbs in the tens of seconds.

In the case of Jane Street, we have one more difficulty with this method: a lot of our code doesn't work at all in bytecode because we never use bytecode programs.

Staging the build

Given that mdx is a build tool, one alternative is to redesign the interaction between mdx and the build system. For instance, it could done in stages with a first step where mdx generates some code that is then compiled and executed normally by the build system. This is how the cinaps tool works for instance.

However, it is difficult to faithfully reproduce the behavior of the toplevel with this method. What is more, such a design is tedious and requires complex collaboration between the tool and the build system.

Going through this amount of complexity for every build tool that wants to compile OCaml code on the fly doesn't feel right.

Using a mixed native/bytecode mode

One idea we considered is using a mixed mode where a native application can execute bytecode. This would work well for us as the snippets of code we evaluate on the fly are always small and fast.

However, it is completely new work while the native JIT has already been done. What is more, while it would work for us it might not work for people who care about the performance of the code evaluated on the fly.

A native JIT would likely benefit more people.

ghost avatar Apr 06 '20 11:04 ghost

To summarize, your proposal is as follows:

  1. Integrate Lexifi's work on direct binary generation in the compiler upstream.
  2. Add the necessary linker logic to use it from ocamlnat.

This sounds like a very reasonable approach to me. (I had this in mind when I replied to your earlier emails but never formulated it clearly, sorry.)

Minor comment: The way this RFC references earlier work by Marcell Fischbach and Benedikt Meurer is slightly confusing; I'm not sure you would reuse much of their work (except the parts that have already been upstreamed, typically the linear-scan register allocator). In particular their suggestion to have a jit.ml that duplicates each emit.mlp file is not convincing for long-term maintenance but you also don't need it now that the x86_64 backend has an abstract assembler representation: you should be able to call emit, and generate code directly from there (I guess this is what the Lexifi patch does).

gasche avatar Apr 06 '20 13:04 gasche

Indeed. I guess the only code part of Marcell Fischbach and Benedikt Meurer's work we would reuse is the C code, which I'm assuming is independent of how the assembly is generated.

ghost avatar Apr 06 '20 13:04 ghost

To clarify: what we have is a way to generate machine code (+ relocation information) from the "x86 assembly AST" (introduced to share code between the two supported assembly syntaxes). Currently, we dump this machine code with a COFF emitter to produce .obj files, but for the use case discussed here, we'd need to write some dynamic code loader directly from the generated machine code + relocations (i.e. put the code in executable pages and apply the relocation) and symbol tables. This should be rather simple I think (and is perhaps covered by the "C code" from Marcell Fischbach and Benedikt Meurer's work).

alainfrisch avatar Apr 06 '20 15:04 alainfrisch

We discussed this quickly at the last OCaml developer meeting. There are a few questions around the portability of writing to executable memory.

We are now going to build a prototype using LexiFi binary code emitter and test it on various platforms (Linux, OSX, BSD and Windows) in order to get a clearer picture of the difficulties. Once this is done, we will discuss this proposal further with the rest of the dev team.

ghost avatar Apr 21 '20 11:04 ghost

To the best of my knowledge, the "LexiFi binary code emitter" was, in a large part, written by me at OCamlPro, for LexiFi. It extends the COFF linker written by Alain with an x86/amd64 in-memory assembler (i.e. Intel Symbolic Assembly 32/64 bits to binary code) and an ELF linker for Linux. The code emitter was also included in ocp-memprof and ocpwin, to generate OCaml native code in a cross-toolchain way. It would probably be more efficient to ask all the authors of the original work if the decision is taken to include this work in OCaml.

lefessan avatar Sep 10 '20 09:09 lefessan

Hi Fabrice, happy to discuss. I'm going to follow up by email to find a time.

ghost avatar Sep 22 '20 16:09 ghost

Hello, are there any updates on the progress?

qubit55 avatar Mar 11 '21 20:03 qubit55

@entrust1234 hi, please don't post 'any update?' comments on issues, it spams everyone who is subscribed. You can subscribe to the issue to receive updates. Thanks!

yawaramin avatar Mar 11 '21 20:03 yawaramin

FTR, I'm no longer driving the project. My colleague @mshinwell took over. I'll let him and/or @NathanReb comment, but what I heard from them about the JIT was positive :)

ghost avatar Mar 30 '21 09:03 ghost

A quick update on the JIT for the native toplevel:

We have a working prototype, implemented as a library outside of the compiler. It requires a couple simple hooks to be added to Opttoploop (soon to be the unified Toploop) and to expose some of the existing types and functions defined there but it is all fairly minimal. Except for LexiFi x86 binary emitter it's about 1 -1.5k lines of code at the moment.

The library provides a Jit.init_top : unit -> unit that uses the above mentioned hooks to set up the JIT in the native toplevel. You can then use Opttoploop or Opttopmain as you normally would and will benefit from the JIT instead of using the external assembler and linker + dynlink.

We're now working on a branch of MDX using the JIT so we can test it on real world use cases such as RealWorldOcaml or on JaneStreet's internal code base, making sure it works as intended and that the performance gain is what we expect.

If that goes well, we'll move on and start upstreaming the changes we need in the native toplevel, hopefully making the JIT available for OCaml 4.13!

NathanReb avatar Mar 31 '21 09:03 NathanReb

Thanks for the change! I still think this is a very nice project and I'm glad to get the update.

In the interest of starting the bikeshedding early: I'm not sure about the "Jit" name because (1) today people associate JITs with dynamic-recompiling implementations, and not just on-demand code emission, so it comes with a lot of assumptions/associations that are not realized here, and (2) the previous toplevel was already "just in time" in the same sense as your prototype, the main difference is whether you go through external tools or emit binary (encoded assembly) directly. I don't think we should debate this right now, but maybe in the next few weeks/months you may have ideas for alternate names.

gasche avatar Mar 31 '21 09:03 gasche

If that goes well, we'll move on and start upstreaming the changes we need in the native toplevel, hopefully making the JIT available for OCaml 4.13!

Does this mean native toplevel will be as usable as the bytecode one?

bikallem avatar Apr 28 '21 09:04 bikallem

@NathanReb is code unload part of the current JIT implementation?

EduardoRFS avatar Apr 28 '21 15:04 EduardoRFS

It won't be, although I've had some thoughts as to how to do it.

mshinwell avatar Apr 28 '21 15:04 mshinwell

@mshinwell if you have time, please share, I'm interested on it for Tezos and I got an example working but only if the code has no data reference(I can ensure it by validating the cmm)

https://github.com/EduardoRFS/ocaml-jit-example

EduardoRFS avatar Apr 28 '21 15:04 EduardoRFS

I haven't thought about this for literally years, so my memory is hazy. However the general idea was the following.

The most difficult problem is probably that, before unloading, you need to make sure there aren't any left-over code pointers into the dynamically-loaded/generated code. I think the problematic places these could occur are on the stack, in live physical registers or in the OCaml heap.

The stack (and all thread stacks) could be scanned to ensure there are no references into the relevant (i.e. dynamically-loaded/generated) code areas before unloading; if a reference is found, the unloading could be tried later. I think there are various different cases here:

  • there might have been a register spill of one of the relevant code pointers (unlikely but has been seen to happen in the past)
  • there might be a return address on the stack pointing into the relevant area
  • the program counter might actually be in one of the functions in the relevant code areas.

Live physical registers could be scanned in a similar way, using the existing liveness information.

For the heap, the places the code pointers might occur (assuming no Obj.magic tricks etc) are in blocks with tag Closure_tag. I was thinking of having some means by which we could determine when these, for the given dynamically-loaded/generated unit, have become unreachable in the heap (discarded at a minor GC, or swept at a major GC). This is tricky but I wondered if it could be done by instrumenting the compiler's code generation for closures in these compilation units, such that there is always an extra environment field pointing at a unique block (one per dynamically-loaded/generated unit), with something like a finaliser on that unique block. The aim would be to arrange that when the finaliser is called, it is safe to unload the relevant code immediately.

The other area of difficulty concerns statically-allocated data, as you mentioned. Maybe we could just not statically allocate anything for these dynamically-loaded modules.

I tend to think there is probably a more general solution involving specific GC regions for each dynamically loaded/generated compilation unit, though the GC doesn't support anything like this at present.

mshinwell avatar Apr 28 '21 16:04 mshinwell

P.S. In fact the code pointers scheme above relies on all closures in the dynamically-loaded/generated units being dynamically allocated, otherwise the finaliser will never be called.

mshinwell avatar Apr 28 '21 16:04 mshinwell

@NathanReb would you by chance have some information on the current status of the native-toplevel revival? The "unify the toploop implementations" part was done (in large part) in #10124. Were people able to test the native toplevel inside mdx?

gasche avatar Sep 01 '21 17:09 gasche

We indeed tested it. The work is available on github and is briefly documented here. It relies on a few forks atm:

  • Patched 4.11 compiler: https://github.com/NathanReb/ocaml/tree/jit-hook-411
  • Ocamlfind metadata for the native toplevel: https://github.com/NathanReb/compiler-libs-opttoplevel
  • Patched topfind library for native toplevel support: https://github.com/NathanReb/opttopfind
  • Patched MDX using the native toplevel with the JIT: https://github.com/realworldocaml/mdx

I tried to provide clear information on how to set all this up in the various repos so you should be able to try it fairly easily. Please reach out to me if anything needs to be clarified!

While working on this we also spotted differences between the native and bytecode toplevels that need to be fixed on the native toplevel side. These are showcased in the ocaml-jit test suite.

There also seems to be an issue with how .cmxs are built by default that caused trouble when trying to dynamically load them. We use a patched version of dune to work around it but according to @mshinwell and @jeremiedimino this fix belongs directly in the compiler rather than in dune.

Next steps from here are a few patch to the compiler and toplevel libraries

  • Build and install the native toplevel libraries (not ocamlnat) by default
  • Add the required hooks to the toplevel
  • Fix .cmxs building
  • Fix the native toplevel to bring it back in line with the bytecode toplevel

I'll be working on those very shortly as we'd very much like to get this into 4.14!

NathanReb avatar Sep 02 '21 08:09 NathanReb

I'm curious about the current status of the project. Any news?

gasche avatar Jul 16 '22 20:07 gasche

The compiler work for this was released with 4.14:

  • https://github.com/ocaml/ocaml/pull/10690 installs ocamltoplevel.cmxa by default on natdynlink-enabled systems and means that for 4.14+ the native-toplevel findlib package can be assumed to be available on any compiler installation which has natdynlink support. Previously, this required special opam switches.
  • https://github.com/ocaml/ocaml/pull/10715 added hooks into the toplevel which allow the points where ocamlnat would typically call the external assembler and linker to be replaced with user-provided functions.

Additionally there were two fixes to bring the behaviour of ocaml and ocamlnat closer together, both of which were identified from testing the new ocaml-jit inside Jane Street:

  • https://github.com/ocaml/ocaml/pull/10712 tweaks the way ocamlnat captures the result of expressions to ensure that type variable names are not lost (this was a consequence of the way the type-checker works combined with the fact ocamlnat has to transform expressions to give them a name: the fix was to move the transformation until after the expression has been type-checked, which means that ocaml and ocamlnat now always type-check the same thing and so, unsurprisingly, get the same answer).
  • https://github.com/ocaml/ocaml/pull/10849 added support to ocamlnat for displaying output with wildcard bindings (e.g. let _ : <type> = <expression>) which before had no output (whereas ocaml does display the result).

All this work, I believe, is being used internally at Jane Street (rebased onto OCaml 4.12) with a customised version of the mdx tool. I believe @NathanReb and @Leonidas-from-XIV are aiming to release a version of mdx using ocaml-jit (and so using native mode to interpret mdx documents) in the next couple of months.

dra27 avatar Jul 17 '22 08:07 dra27

Thanks for the news.

One side-effect benefit I hoped for this project is to get a usable ocamlnat toplevel (installed) for all users (I assume that this means upstreaming the native-code-emission logic at some point, but maybe there is a different way). Is this on the roadmap?

gasche avatar Jul 17 '22 09:07 gasche

Sorry to jump in that late but, why has this PR been closed?

The proposal looked interesting, and there is no justification given for closing the PR.

Could we either re-open it, or explain clearly why it has been closed?

shindere avatar Aug 29 '22 13:08 shindere

I think it's because Jérémie's github account has been deleted.

mshinwell avatar Aug 29 '22 13:08 mshinwell

Mark Shinwell (2022/08/29 06:50 -0700):

Reopened #15.

Okay, thanks. And thanks a lot for havign re-opened it, the topic seems interesting and worth exploring to me.

shindere avatar Oct 11 '22 07:10 shindere

My understanding is that currently the project is on hold. @dra27 took care of upstreaming the necessary hooks to be able to implement a JIT outside the compiler, which is enough for the mdx use-case, and there has been no more work intended for upstreaming. (At ICFP last September @dra27 mentioned tweaking the installation status of ocamlnat and support binaries iirc.)

Personally I hope that we will eventually get native binary emission in the compiler upstream (or maybe in a well-identified external library), as we discussed when the RFC was originally written, for example reusing the Lexifi code -- the discussion of this is a large part of the RFC. I think this would be especially useful in combination with MetaOCaml, and in general an excellent contribution for the whole ecosystem, not just mdx. (It also come with delicate questions of code maintenance etc.)

But Jérémie is not working on this anymore, and I don't know if the remaining people are interested in doing the extra work to make the project more widely useful.

gasche avatar Oct 11 '22 07:10 gasche