learn-ocaml icon indicating copy to clipboard operation
learn-ocaml copied to clipboard

`learn-ocaml build -j 2` is broken

Open erikmd opened this issue 4 years ago • 3 comments

Bug description

It seems there's a race condition in learn-ocaml build that prevents us from safely using the --jobs option.

To Reproduce

Steps to reproduce the behavior:

  • Clone learn-ocaml (git clone https://github.com/ocaml-sf/learn-ocaml.git)
  • Run cd learn-ocaml
  • Docker-pull learn-ocaml master (docker pull ocamlsf/learn-ocaml:master)
  • Run docker run --rm -it -v "$PWD/demo-repository:/repository" ocamlsf/learn-ocaml:master -j 2

Then, there is high probability that we get:

Learnocaml v.0.12 running.
Updating app at ./www
Cannot process exercises: Unix.Unix_error(Unix.EEXIST, "mkdir", "./www/static")

if the repository contains 2+ exercises (which is now the case of demo-repository).

Expected behavior

No error should happen.

erikmd avatar Aug 27 '21 21:08 erikmd

As an aside, when this bug will be fixed:

we may want to add a -j 2 (or -j $(nproc)) flag in the biggest, learn-ocaml-corpus based, test of our CI to get a speed-up…

erikmd avatar Aug 27 '21 21:08 erikmd

I have examined this a bit, and it attempts to use Lwt_unix.fork() which just can't be made to work reliably (see e.g. https://github.com/ocsigen/lwt/issues/812 and there are others like it) ; duplicating the Lwt state in the child process is just going to result in a mess, I am even puzzled that there appears to have been an Lwt version where it worked at some point.

So the option should just be disabled at this point. The more reasonable implementation could be:

  • define a dedicated CLI command to compile a specific exercise
  • run a whole process with that command inside the Lwt_list.map_p on each exercise, using e.g. Lwt_process.exec. The difficulty is to make sure that all relevant options in the current context are properly passed to the child process.

Or, of course, we could wait for multicore which should make it fairly trivial :rocket:

AltGr avatar Sep 17 '21 16:09 AltGr

Hi @AltGr, thanks a lot for investigating, and for these pointers.

Or, of course, we could wait for multicore which should make it fairly trivial

Yes, that sounds a good trade-off → and replacing the learn-ocaml build -j $n option with an error, or just with (a -j 1 fallback + a warning)… I don't know what'd be more sensible 🤔

BTW, I've just skimmed the open issues, and it seems this had been spotted some time ago; (just putting the URL here to link both issues: https://github.com/ocaml-sf/learn-ocaml/issues/158)

erikmd avatar Sep 17 '21 22:09 erikmd