learn-ocaml
learn-ocaml copied to clipboard
Feature: new option à la `--replace` to leverage this existing feature in a `docker restart` context
Related user(s):
@erikmd, @AltGr
Related issue(s) or PR(s):
- #567
Related project scope(s):
No response
The problem:
- learn-ocaml 1.0.0 's
learn-ocaml build serve --replaceis very nice :100: - but in a docker-compose context, on an exodir update, we don't successively start 2 learn-ocaml procs
(anyway, there's the 1-proc-per-container rule!) - we rather use
docker compose restart, and we would really like to benefit from the--replacesemantics in this case
Wanted solution:
- add an alternative option to
--replace, e.g.--fork-replace, or--smooth-replace
Considered alternatives:
- (add an additional option, passed along with
--replace, e.g.--forkor--quick-restart) - (but issue if
--forkis passed without--replace)
Additional context:
We should think about ENV VARS as well! (e.g., a single environment variable that can be set to {}="0", "1" or "2")
I am not sure the executable itself can do anything in the case when it is containerised in this way (I don't understand what --fork-replace would do ?): AFAIK replacing a containerised server with minimal downtime is one of the core functions of any load-balancer / orchestrator: start new container, wait for it to start accepting connections, start routing incoming connections to it, kill older container -- was'nt that available before I added the option for the stand-alone case ?
Maybe the new option would help with the management of stored state during such transitions ?
Hi @AltGr, thanks a lot for your comment! Indeed, the overall aim is to have a Dockerised server with minimal downtime.
I agree that this may be possible to implement this by using a Docker Swarm or Kubernetes balancer, several instances sharing the www folder, but this looks quite difficult to me to implement though. And in usual teaching context, we only have ~100 students, where a single instance using Docker Compose already scales for that, while the --replace option can't be used in this context IINM. Indeed:
- in general we only have 1 main process per container,
- and if we spin two different instances, the
wwwfolder is not shared between containers, nor the TCP interface (each container has a localhost:8080 address which is local).
So my idea was to support a similar CLI option, to offer this high-availability feature easily within the app's logic, namely:
- at container startup, the default command
learn-ocaml build serve --repo=/repository --fork-replaceis run; - if the
wwwfolder already exist (and should be rebuilt), it forks its own process; the first one serves the existingwww, and the second one behaves aslearn-ocaml build serve --repo=/repository --replace; - if the
wwwfolder doesn't exist, the option--fork-replaceis ignored and no fork occurs.
→ What do you think?
Have a nice WE!
Maybe the new option would help with the management of stored state during such transitions ?
I didn't think about this (it's another topic!): maybe there will also be sth to do to get a safer --replace; I don't know.
Ah! I understand better your intent.
Wouldn't [ -d www ] && learn-ocaml serve & learn-ocaml build serve --replace work then ?
Hi @AltGr! Mostly yes, this could work. (Except that [ -d www ] && … would yield a bad exit code if www doesn't exist.)
Actually when Docker entrypoints come into play, I believe it's always slightly better to avoid spinning a shell, to ensure that the entrypoint binary always receives proper signals (see e.g. https://hynek.me/articles/docker-signals/)
So, what do we decide?
Option 1. Just keep learn-ocaml CLI as is, and change the Docker entrypoint. (I downvote this one because of signals.)
Option 2. Implement --fork-replace with Unix.fork along with the current implementation of --replace.
Option 3. Implement --fork-replace with Unix.fork and with Unix.kill (given we know the PID of the child process)
And if ever Option 2. or 3. looks good to you, would you prefer to implement it, or to review the PR ? :)
After reading https://hynek.me/articles/docker-signals/ I still think signals should work, since we already use dumb-init... it may be worth testing anyway. Otherwise no strong objection to option 3 (maybe with a more high-level name than fork-init, that doesn't convey the intent: something like --serve-right-away would be more explicit).
(also, be very careful with forking when any lwt stuff is going on, it's very easy to have extremely weird stuff happening even if you tried to clean up all lwt threads in the child)
Thanks @AltGr !
After reading https://hynek.me/articles/docker-signals/ I still think signals should work, since we already use dumb-init...
AFAIU, it would be very OK if the shell is ephemeral (using the exec builtin), which is not the case.
it may be worth testing anyway.
Sure, I can volunteer to do more tests.
Otherwise no strong objection to option 3 (maybe with a more high-level name than fork-init, that doesn't convey the intent: something like --serve-right-away would be more explicit).
OK, good point.
(also, be very careful with forking when any lwt stuff is going on, it's very easy to have extremely weird stuff happening even if you tried to clean up all lwt threads in the child)
OK. Would you have a few suggestions in mind, regarding the things to monitor ? maybe, with a dedicated tool ?
And do you think that the potentiel weird things that can occur would be specific to the forking ? or would they already occur with the current implementation of --replace ?
Have a very nice WE!
Ah indeed, the Dockerfile ENTRYPOINT is ["dumb-init","/usr/bin/learn-ocaml","--sync-dir=/sync","--repo=/repository"], which doesn't go through a shell, but we'd need to interspede the snippet between the dumb-init and the learn-ocaml process. Of course we could exec the second run of learn-ocaml but we'll still run into problems if one tries to interrupt while the "temporary server" is still active (and that one will have to be in a subshell anyway, even if we exec it).
I'm still pretty sure it can be done... but the balance of which approach is simplest might be changed :)
As long as you fork before running Lwt_main.run you'll be fine. After that, you can use Lwt_unix.fork but if there is any lwt calls within the child, extremely bad and confusing things will happen. It's just fork, replace just kills an external process, which is safe.