Exception escaping libev main loop in 5.9.1.
Since upgrading to 6.1.1, we often see this exception killing the main Lwt loop:
IO error: Unix.Unix_error(Unix.ECONNRESET, "write", "")
Raised by primitive operation at Lwt_engine.libev#iter in file "src/unix/lwt_engine.ml", line 189, characters 6-24
Re-raised at Lwt_engine.libev#iter in file "src/unix/lwt_engine.ml", line 192, characters 6-15
Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 36, characters 6-49
Called from Lwt_main.run in file "src/unix/lwt_main.ml", line 106, characters 8-13
Re-raised at Lwt_main.run in file "src/unix/lwt_main.ml", line 112, characters 4-13
Called from Dune__exe__Main.command in file "bin/src/main.ml", line 214, characters 2-19
Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 24, characters 19-24
Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 22, characters 12-19
Called from Cmdliner_term.term_result.(fun) in file "cmdliner_term.ml", line 48, characters 25-32
Called from Cmdliner_eval.run_parser in file "cmdliner_eval.ml", line 35, characters 37-44
Am I correct in assuming that this should never happen and requires a fix in the libev backend? Given some pointer I'd be happy to try & fix it.
To elaborate, I do not think the upgrade of LWT caused this, because reverting to 5.8.x does not fix the issue. I think this is due to an upgrade of cohttp. The EPIPE escapes Cohttp.Client.call and is properly handled, but it looks like the next iteration of the libev main loop immediately reraises that exception.
While it's the interaction with cohttp that causes this, it still seems to me that this is a bug in the libev backend, since my understandings is that exceptions should ever only fail promises or be passed to async_exception_hook.
After more investigation, I can confirm that the exception is raised only once, is properly caught by the LWT engine and transmitted to the promise, where it's handled with Lwt.catch and neutralized. Yet it re-escapes the main loop immediately after, on what I suppose is the next iteration.
Bellow is a collection of various backtraces of the raise point that triggered this. Apart from lwt internals, tls-lwt seems to always be involved, although this could be a coincidence since most of our i/o uses TLS. I have tried reverting tls-lwt to previous version, to no avail. I'm still unsure whether this is necessarily an issue in LWT or if the culprit could be one of the *-lwt dependencies.
# This is a manual print & backtrace dump from the `raise` point
XXX cohttp-lwt-unix debug: raise IO_error from wrap_write: Unix.Unix_error(Unix.EPIPE, "write", "")
Raised by primitive operation at Lwt_unix.write.(fun) in file "src/unix/lwt_unix.cppo.ml", line 717, characters 39-67
Called from Lwt_unix.wrap_syscall.(fun) in file "src/unix/lwt_unix.cppo.ml", line 571, characters 17-28
Re-raised at Lwt_unix.wrap_syscall.(fun) in file "src/unix/lwt_unix.cppo.ml", line 585, characters 4-17
Called from Tls_lwt.Lwt_cs.naked in file "lwt/tls_lwt.ml", line 19, characters 4-19
Called from Tls_lwt.Lwt_cs.write in file "lwt/tls_lwt.ml" (inlined), line 24, characters 14-56
Called from Tls_lwt.Lwt_cs.write_full in file "lwt/tls_lwt.ml", line 32, characters 6-51
Called from Lwt.Sequential_composition.catch in file "src/core/lwt.ml", line 2016, characters 10-14
Re-raised at Tls_lwt.Unix.read_t.recording_errors.(fun) in file "lwt/tls_lwt.ml", line 116, characters 12-27
Called from Tls_lwt.of_t.(fun) in file "lwt/tls_lwt.ml", line 337, characters 17-41
Called from Lwt_io.perform_io in file "src/unix/lwt_io.ml", line 236, characters 10-35
Called from Lwt_io.Primitives.unsafe_write_from' in file "src/unix/lwt_io.ml", line 882, characters 6-22
Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1844, characters 16-19
# This is the handling of the exception. It does *not* reraise it.
ℹ 2025-08-12 22:30:24 [schematic-http]
retryable IO error requesting https://o525661.ingest.sentry.io/api/6699179/store/: Unix.Unix_error(Unix.EPIPE, "write", "")
# Yet the exception reescapes the main loop just after
routine-api: internal error, uncaught exception:
IO error: Unix.Unix_error(Unix.EPIPE, "write", "")
Raised by primitive operation at Lwt_engine.libev#iter in file "src/unix/lwt_engine.ml", line 189, characters 6-24
Re-raised at Lwt_engine.libev#iter in file "src/unix/lwt_engine.ml", line 192, characters 6-15
Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 36, characters 6-49
Called from Lwt_main.run in file "src/unix/lwt_main.ml", line 106, characters 8-13
Re-raised at Lwt_main.run in file "src/unix/lwt_main.ml", line 112, characters 4-13
Called from Dune__exe__Main.command in file "bin/src/main.ml", line 214, characters 2-19
Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 24, characters 19-24
Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 22, characters 12-19
Called from Cmdliner_term.term_result.(fun) in file "cmdliner_term.ml", line 48, characters 25-32
Called from Cmdliner_eval.run_parser in file "cmdliner_eval.ml", line 35, characters 37-44
XXX cohttp-lwt-unix debug: raise IO_error from wrap_write: Unix.Unix_error(Unix.EPIPE, "write", "")
Raised by primitive operation at Lwt_unix.write.(fun) in file "src/unix/lwt_unix.cppo.ml", line 717, characters 39-67
Called from Lwt_unix.retry_syscall in file "src/unix/lwt_unix.cppo.ml", line 509, characters 13-24
Re-raised at Tls_lwt.Unix.read_t.recording_errors.(fun) in file "lwt/tls_lwt.ml", line 116, characters 12-27
Called from Lwt.Sequential_composition.catch.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 2041, characters 16-21
XXX cohttp-lwt-unix debug: raise IO_error from wrap_write: Unix.Unix_error(Unix.EPIPE, "Tls_lwt.write", "")
Raised at Tls_lwt.Lwt_cs.naked.(fun) in file "lwt/tls_lwt.ml", line 22, characters 18-64
Called from Tls_lwt.Lwt_cs.write in file "lwt/tls_lwt.ml" (inlined), line 24, characters 14-56
Called from Tls_lwt.Lwt_cs.write_full in file "lwt/tls_lwt.ml", line 32, characters 6-51
Called from Lwt.Sequential_composition.catch in file "src/core/lwt.ml", line 2016, characters 10-14
Re-raised at Tls_lwt.Unix.read_t.recording_errors.(fun) in file "lwt/tls_lwt.ml", line 116, characters 12-27
Called from Tls_lwt.of_t.(fun) in file "lwt/tls_lwt.ml", line 337, characters 17-41
Called from Lwt_io.perform_io in file "src/unix/lwt_io.ml", line 236, characters 10-35
Called from Lwt_io.Primitives.unsafe_write_from' in file "src/unix/lwt_io.ml", line 882, characters 6-22
Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1844, characters 16-19
XXX cohttp-lwt-unix debug: raise IO_error from wrap_write: Unix.Unix_error(Unix.EPIPE, "write", "")
Raised by primitive operation at Lwt_unix.write.(fun) in file "src/unix/lwt_unix.cppo.ml", line 717, characters 39-67
Called from Lwt_unix.wrap_syscall.(fun) in file "src/unix/lwt_unix.cppo.ml", line 571, characters 17-28
Re-raised at Lwt_unix.wrap_syscall.(fun) in file "src/unix/lwt_unix.cppo.ml", line 585, characters 4-17
Called from Tls_lwt.Lwt_cs.naked in file "lwt/tls_lwt.ml", line 19, characters 4-19
Called from Tls_lwt.Lwt_cs.write in file "lwt/tls_lwt.ml" (inlined), line 24, characters 14-56
Called from Tls_lwt.Lwt_cs.write_full in file "lwt/tls_lwt.ml", line 32, characters 6-51
Called from Lwt.Sequential_composition.catch in file "src/core/lwt.ml", line 2016, characters 10-14
Re-raised at Tls_lwt.Unix.read_t.recording_errors.(fun) in file "lwt/tls_lwt.ml", line 116, characters 12-27
Called from Tls_lwt.of_t.(fun) in file "lwt/tls_lwt.ml", line 337, characters 17-41
Called from Lwt_io.perform_io in file "src/unix/lwt_io.ml", line 236, characters 10-35
Called from Lwt_io.Primitives.unsafe_write_from' in file "src/unix/lwt_io.ml", line 882, characters 6-22
Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1844, characters 16-19
Thanks for the report, I'm looking into this.
What OCaml version are you using for this? There is a small chance that the backtrace is wrong because of the way backtraces are handled in OCaml (they are not attached to the exception itself, they are a stateful part of the runtime) and they could be subtly wrong in different ways in different versions of OCaml.
(answering for @mefyl because he's afk)
This is on OCaml 5.3.0
@raphael-proust do you know if this invalidates our backtraces? Just trying to figure out if my initial diagnostic is correct, or if this could be something entirely unrelated to lwt.
sorry I haven't looked at this recently
i'm setting some time aside next week to investigate
I've looked into it a little bit. I still don't have a full picture in my mind yet but here are preliminary findings:
- Lwt does let exceptions from libev escape. It has done so for ever. Specifically, in
src/unix/lwt_engine.ml, in classlibev, in methoditer(which is where the exception escapes from according to your stack trace), the exception coming from libev is reraised. This code has not changed since at least 3.0.0 (there seem to have been some filename changes or some other reorganisation which makes it not immediately trivial to compare further back. - The main loop does not catch exceptions from the engine. It hasn't at least as far back as 4.0.0 (same re: reorganisation but different version).
- The exception management happens in
src/unix/lwt_unix.cppo.ml. All calls toLwt_enginewhich add hooks that get iter-ed on are protected bytry-withbecause they all lead back toretry_syscall.
I'm going to dig around to check that there isn't an exception leak on that side. But something that looks a bit more plausible is that cohttp adds a hook that's not protected. Cohttp makes calls into Lwt_engine directly, relying on low-level interface.
Sorry for the delay, I'll have a bit more of a look, don't hesitate to poke around cohttp or curl/ocurl on your side.
Thank you so much for the investigation! So if you're correct, it means that my initial assumption that exceptions should not escape the event loop is wrong, and that the problem is somewhere in cohttp, I'll look on that side. I'll let you close if you deem it's indeed not a bug in Lwt.
it means that my initial assumption that exceptions should not escape the event loop is wrong
kinda yes, kinda no…
it's like: the event loop is mostly responsible for calling callbacks, callbacks are provided by the user, the user is responsible for them
so like: users should not let exceptions escape the callbacks that are passed to the event loop
which is not obvious and not really documented and …
the bug could be in cohttp or it could be in the curl dependency