lwt
lwt copied to clipboard
forking breaks lwt-io if happens after Lwt_main.run
If I use Lwt_unix.fork
(or Unix.fork
) after any successful invocation of Lwt_main.run
(even Lwt_main.run (Lwt.return ())
), then in all consecutive forks the lwt-io system will not work in the child processes. The peculiar thing here is that this happens only if I do not perform any Lwt-specific stuff in the first fork1 (i.e., this fork is totally lwt-independent). The version that I am using is 5.6.1.
Here is the code to reproduce, put it into run.ml
,
open Lwt.Infix
open Lwt.Syntax
let fork_and_wait () =
match Lwt_unix.fork () with
| 0 -> Unix.sleep 1; exit 0
| child ->
match Unix.waitpid [] child with
| _,WEXITED 0 -> ()
| _ -> assert false
let fork_and_talk () =
let input,output = Lwt_unix.pipe () in
match Lwt_unix.fork () with
| 0 ->
Lwt_unix.close input >>= fun () ->
let output = Lwt_io.of_fd ~mode:Output output in
Lwt_io.write_value output "hello!" >>= fun () ->
Lwt_io.flush output >>= fun () ->
Lwt_io.close output >|= fun () ->
exit 0
| pid ->
Lwt_unix.close output >>= fun () ->
let input = Lwt_io.of_fd ~mode:Input input in
let* hello = Lwt_io.read_value input in
Lwt_io.close input >>= fun () ->
Lwt_io.printl ("fork and talk: " ^ hello) >>= fun () ->
Lwt_unix.waitpid [] pid >|= function
| _,WEXITED 0 -> ()
| _ -> assert false
let just_talk () =
let input,output = Lwt_unix.pipe () in
let output = Lwt_io.of_fd ~mode:Output output in
Lwt_io.write_value output "hello!" >>= fun () ->
Lwt_io.flush output >>= fun () ->
Lwt_io.close output >>= fun () ->
let input = Lwt_io.of_fd ~mode:Input input in
let* hello = Lwt_io.read_value input in
Lwt_io.close input >>= fun () ->
Lwt_io.printl ("just talk: " ^ hello)
let () =
Lwt_main.run (Lwt.return ()); (* all works if this line is removed *)
fork_and_wait (); (* or if this one is removed *)
Lwt_main.run (fork_and_talk ());
Lwt_main.run (just_talk ());
and the dune file for your convenience,
(executable
(name run)
(libraries lwt lwt.unix))
FWIW, I also tried using the libev engine, with select
, poll
, and epoll
backends, they all exhibit the same behavior.
1)) In terms of the above example, it means that if I will remove fork_and_wait
then I can do fork_and_talk
as many times as I like. It is only the peculiar combination of running any Lwt_main.run
(even the trivial one that apparently shall not have any observable side-effects) and doing a fork (even with Unix.fork
) that doesn't touch any lwt-related stuff. Unfortunately, this combination is quite common in large applications.
If you insert Lwt_unix.set_pool_size 0 ;
before the first Lwt_main.run
it will work. I Imagine Lwt starts a worker thread, which will not exist in the child process after the fork. With Lwt < 5, Lwt_unix.(set_default_async_method Async_none)
would also have done the trick.
More generally I have no idea why Lwt provides Lwt_unix.fork
given that (i) Lwt starts worker threads on encountering blocking system calls and (ii) threads and fork don't mix (all you can do in the child process after fork in a multi-threaded program is call async-signal-safe functions and then exec). Note that Unix.execv*
is not thread safe and cannot be used in multi-threaded programs either.
Possibly Lwt_unix.fork
is not thread safe either, because the documentation says that (as opposed to Unix.fork
) "in the child process all pending jobs are canceled", and I doubt that that can be achieved only by means of applying async-signal-safe functions.
If you insert
Lwt_unix.set_pool_size 0 ;
before the firstLwt_main.run
it will work. I Imagine Lwt starts a worker thread, which will not exist in the child process after the fork. With Lwt < 5,Lwt_unix.(set_default_async_method Async_none)
would also have done the trick.
Yes, it was our workaround to use Lwt_unix.(set_default_async_method Async_none)
in the child process.
More generally I have no idea why Lwt provides Lwt_unix.fork given that (i) Lwt starts worker threads on encountering blocking system calls and (ii) threads and fork don't mix (all you can do in the child process after fork in a multi-threaded program is call async-signal-safe functions and then exec). Note that Unix.execv* is not thread safe and cannot be used in multi-threaded programs either.
Yep, in the end we had to switch to lwt-parallel library instead of using fork directly. This library addresses this issue by creating a process snapshot before lwt is used. Probably, this is the only safe way of using Lwt_unix.fork
with lwt. We pushed some new features to lwt-parallel (including explicit snapshots). See https://github.com/ocaml/opam-repository/pull/22611