eio icon indicating copy to clipboard operation
eio copied to clipboard

Eio.Flow.read hangs on closed flow

Open quartz55 opened this issue 2 years ago • 3 comments

Not sure if I'm doing something wrong, but I've run into the following issue (on the Luv backend).

A fiber waiting on a Eio.Flow.read before another fiber closes it will hang forever, at least for a client type Eio.Net.connect flow.

I've managed to trim it down to the following case:

reproduce.ml
open Eio.Std

let main env =
  let run_client ~net ~addr =
    traceln "client on";
    Switch.run (fun sw ->
        let flow = Eio.Net.connect ~sw net addr in
        traceln "client <-> server";
        let read_loop () =
          let b = Cstruct.create 4 in
          traceln "client will be stuck after this";
          while true do
            Eio.Flow.read flow b |> ignore;
            traceln "unreachable"
          done
        in
        let d = "Hello from client" in
        traceln "client -> %S" d;
        Eio.Flow.copy_string d flow;
        Fiber.fork ~sw read_loop;
        Eio.Flow.close flow);
    traceln "client off"
  in
  let run_server socket =
    traceln "server on";
    Switch.run (fun sw ->
        Eio.Net.accept_sub socket ~sw
          (fun ~sw:_ flow _addr ->
            traceln "server <-> client";
            let b = Buffer.create 100 in
            Eio.Flow.copy flow (Eio.Flow.buffer_sink b);
            traceln "server <[EOF]- %S" (Buffer.contents b))
          ~on_error:(traceln "Error handling connection: %a" Fmt.exn));
    traceln "server off"
  in

  let net = Eio.Stdenv.net env in
  let addr = `Tcp (Eio.Net.Ipaddr.V4.loopback, 9999) in
  Switch.run @@ fun sw ->
  let server = Eio.Net.listen net ~sw ~reuse_addr:true ~backlog:5 addr in
  Fiber.both (fun () -> run_server server) (fun () -> run_client ~net ~addr)

Where the expected output would be:

+server on
+client on
+client <-> server
+client -> "Hello from client"
+server <-> client
+client will be stuck after this
+server <[EOF]- "Hello from client"
+server off
*hangs forever*

quartz55 avatar May 05 '22 16:05 quartz55

Indeed, it seems that libuv does not call the read callback in this case:

let or_fail = function
  | Ok x -> x
  | Error e -> failwith (Luv.Error.strerror e)

let () =
  let a, b = Luv.TCP.socketpair `STREAM 0 |> or_fail in
  let tcp_a = Luv.TCP.init () |> or_fail in
  let tcp_b = Luv.TCP.init () |> or_fail in
  Luv.TCP.open_ tcp_a a |> or_fail;
  Luv.TCP.open_ tcp_b b |> or_fail;
  Luv.Stream.read_start tcp_a (fun _ -> print_endline "read a ready!");
  print_endline "close";
  Luv.Handle.close tcp_a ignore;
  let t = Luv.Timer.init () |> or_fail in
  Luv.Timer.start t 1000 (fun () -> print_endline "timer done") |> or_fail;
  ignore (Luv.Loop.run () : bool)

Output is just "close/timer done" (no "read a ready").

strace shows:

epoll_ctl(5, EPOLL_CTL_DEL, 3, 0x7ffd474327f4) = -1 ENOENT (No such file or directory)
close(3)                                = 0

It would be nice if this worked, but (depending on what you're trying to do) it's probably cleaner to replace the close with shutdown, e.g.

         Eio.Flow.shutdown flow `Send

Then the server will read end-of-file and display the message from the client, then close its own connection, which will cause the client to read end-of-file. If the client wants to cancel its own read, it can use Eio's cancellation (Switch.fail, etc) rather than relying on closing the FD to do it.

talex5 avatar May 06 '22 11:05 talex5

Thanks, using shutdown instead does seem to sidestep the issue 👍

Funny thing, for reference, the source of my problem ended up being a Tls module loosely based on yours: https://gitlab.com/talex5/gemini-eio/-/blob/master/src/tls_eio.ml#L128

Replacing that close call with a shutdown `Send made everything work 🙌

quartz55 avatar May 07 '22 14:05 quartz55

Funny thing, for reference, the source of my problem ended up being a Tls module loosely based on yours:

Hmm, yes, that Tls wrapper should probably now implement Flow.two_way and provide a shutdown operation that sends the close notify and then calls shutdown on the underlying flow.

Then Tls_eio.close can just close the underlying flow immediately

talex5 avatar May 13 '22 10:05 talex5

In fact, I think continuing to wait is the correct behaviour. close is an operation on the file descriptor, not on the file (socket) itself. The socket itself is shut down when there are no more users of it, but the read is a valid user.

talex5 avatar Aug 26 '22 13:08 talex5

Closing, as not a bug.

Note: there's now also a proper tls-eio package (https://github.com/mirleft/ocaml-tls/pull/451).

talex5 avatar Sep 27 '22 11:09 talex5