dream icon indicating copy to clipboard operation
dream copied to clipboard

Unix socket isn’t cleaned up on server shutdown

Open toastal opened this issue 6 months ago • 20 comments

I created a socket, but when the server shuts down, my socket file, dream.sock, is not removed. If I then try to restart the server I get the following error

exception Unix.Unix_error(Unix.EADDRINUSE, "bind", "")

This is frustrating since I manually need to remove the file before starting the server & the server should be cleaning up its own resources.

toastal avatar Jun 07 '25 14:06 toastal

Use Dream.serve which gives you a promise you can await and then do your cleanup? It was meant for this kind of use case. Eg

let stop =
  let promise, resolve = Lwt.wait () in
  Sys.set_signal Sys.sigterm (Signal_handle (fun _ -> Lwt.wakeup_later resolve ()));
  promise

let main () =
  let socket_path = "..." in
  Dream.serve ~stop ~socket_path @@ Dream.logger @@ Dream.router [...])
  [%lwt.finally Lwt_unix.unlink socket_path]

let () = Lwt_main.run (main ())

yawaramin avatar Jun 24 '25 18:06 yawaramin

Ok so man 2 bind on linux says it will return EADDRINUSE if the port numbers is being used. I suspect it's as simple as triggering some path where close isn't called

So I see Lwt_unix.unlink, but on the unlink man page:

If the name was the last link to a file but any processes still have the file open, the file will remain in existence
       until the last file descriptor referring to it is closed.

so surely that needs to be an explicit Lwt_unix.close as well?

LAC-Tech avatar Aug 08 '25 07:08 LAC-Tech

My understanding is that there is already a Lwt_unix.close inside Lwt_io.establish_server. The question is that when this is a Unix domain socket, do we also need to clean up the separate file system file that was created? What's the right way for a Unix server to handle a Unix domain socket? What do the various daemon managers expect it to do?

aantron avatar Aug 08 '25 09:08 aantron

Just as a data point, the Go people are saying it's the caller's responsibility: https://github.com/golang/go/issues/70985

They quoted this page: https://man7.org/linux/man-pages/man7/unix.7.html

Binding to a socket with a filename creates a socket in the filesystem that must be deleted by the caller when it is no longer needed (using unlink(2)). The usual UNIX close-behind semantics apply; the socket can be unlinked at any time and will be finally removed from the filesystem when the last reference to it is closed.

I am interpreting 'caller' in our case as the caller of the Dream.serve function.

Maybe adding an example to the tutorial is fine for this.

@toastal did you try my suggestion? Did you have any issues with it?

yawaramin avatar Aug 08 '25 16:08 yawaramin

the socket can be unlinked at any time

Does that mean we can proactively do the unlink on the socket immediately after establish_server returns?

aantron avatar Aug 08 '25 17:08 aantron

~Good catch! It does seem like it.~

On second thought, that might not work. When the server is established, the server needs to wait for a 'client' to open the socket file as well. It can't just immediately unlink the socket file because that would remove its directory entry and prevent the client from accessing it.

yawaramin avatar Aug 08 '25 18:08 yawaramin

It seems like it needs testing. Why would the socket file be removed? There is an open fd in the server process referring to it.

aantron avatar Aug 08 '25 19:08 aantron

I guess the same reason why a file can be deleted in a Unix filesystem even though it was opened by a process? The inode still exists but it can't be accessed from the filesystem because it was unlinked. But yeah, we should double-check.

yawaramin avatar Aug 08 '25 20:08 yawaramin

If it is the user’s responsibility, it should at least be in the docs I think as I wasn’t expecting this behavior with novice experience in this space.

toastal avatar Aug 09 '25 05:08 toastal

At least Dream.run ought to do this cleanup internally, it's supposed to be the no-boilerplate API.

aantron avatar Aug 09 '25 05:08 aantron

And as for Dream.serve, I think this would be decided by whether there is ever any legitimate reason to leave the file around. Can we make the decision for the user to always clean it up? Or are there reasons why a user might want to keep it around longer than the server is running?

aantron avatar Aug 09 '25 05:08 aantron

If Dream.run uses a default stop promise that resolves on SIGTERM, agree it's simple for it to do this cleanup.

For serve, I don't know if we can say with certainty that the socket file should never outlive the server, so imho we shouldn't do the cleanup there. This should be documented of course.

yawaramin avatar Aug 09 '25 06:08 yawaramin

When doing automatic removal we should also be sure to check what happens if the socket is an abstract Unix domain socket. I guess unlink would fail in that case. We should probably not call it to begin with if we have an abstract socket.

aantron avatar Aug 09 '25 06:08 aantron

By the way, @toastal, would your original use case be best served by an abstract Unix domain socket, to begin with?

aantron avatar Aug 09 '25 06:08 aantron

By the way, @toastal, would your original use case be best served by an abstract Unix domain socket, to begin with?

What is “abstract” here? This means it lives in the kernel & there is no cleanup? I still need a path of sorts, so probably needs to be a file path.

proxy.reverse.url = "http://[unix:/path/to/socket]/path"

toastal avatar Aug 09 '25 12:08 toastal

Apparently in Linux abstract socket paths are just 'null byte + path' so eg "\000/path/to/socket". But beware that someone named 'Lennart' warns of potential security issues with them: https://utcc.utoronto.ca/~cks/space/blog/linux/SocketAbstractNamespace?showcomments#comments (for context, given the fact that the comment was on Chris Siebenmann's blog, I suspect it's Lennart Poettering, creator of systemd)

yawaramin avatar Aug 09 '25 15:08 yawaramin

The standard solution here is to unlink the socket file right before calling bind, since the cleanup code won't run when the process isn't closed gracefully. Here's an example of syslogd calling unlink before bind. [^1]

One can try to remove the socket file when closing gracefully to avoid leaving it behind (both libuv and the Go runtime do this by default, for example). That's fine if you simply stop-start the server, but breaks when starting a new instance before closing the old one: The new instance will unlink and bind its own socket, and then the old one will unlink the new socket instead. That workflow is common to minimize downtime, so unlinking on close is usually optional (not sure if that use case matters here though).

@yawaramin I too suspect that's Poettering. And I agree on this one, if only because they don't interact with the existing (file) permissions model and you need to reimplement that yourself with SO_PEERCRED. Also they're Linux-only, whereas AF_UNIX SOCK_STREAM exists even on Windows now.

[^1]: Edited to link to the latest code, not the revision I found linked.

amongonz avatar Aug 10 '25 12:08 amongonz

Good point. Unlinking immediately before starting the Dream server seems like the best option overall. And even better, Dream can just do it pretty trivially.

yawaramin avatar Aug 10 '25 15:08 yawaramin

Seeing PR #407 deleting the file got me thinking -- do we ever need to be careful to check that the file is a socket to begin with? That is, what if there's a regular file at socket's path, and this causes us to unlink it? What's the standard practice in the wider world of network and daemon programming?

aantron avatar Oct 02 '25 09:10 aantron

We can easily check that the file is a socket, if it exists, using Lwt_unix.stat.

yawaramin avatar Oct 02 '25 15:10 yawaramin