Unix socket isn’t cleaned up on server shutdown
I created a socket, but when the server shuts down, my socket file, dream.sock, is not removed. If I then try to restart the server I get the following error
exception Unix.Unix_error(Unix.EADDRINUSE, "bind", "")
This is frustrating since I manually need to remove the file before starting the server & the server should be cleaning up its own resources.
Use Dream.serve which gives you a promise you can await and then do your cleanup? It was meant for this kind of use case. Eg
let stop =
let promise, resolve = Lwt.wait () in
Sys.set_signal Sys.sigterm (Signal_handle (fun _ -> Lwt.wakeup_later resolve ()));
promise
let main () =
let socket_path = "..." in
Dream.serve ~stop ~socket_path @@ Dream.logger @@ Dream.router [...])
[%lwt.finally Lwt_unix.unlink socket_path]
let () = Lwt_main.run (main ())
Ok so man 2 bind on linux says it will return EADDRINUSE if the port numbers is being used. I suspect it's as simple as triggering some path where close isn't called
So I see Lwt_unix.unlink, but on the unlink man page:
If the name was the last link to a file but any processes still have the file open, the file will remain in existence
until the last file descriptor referring to it is closed.
so surely that needs to be an explicit Lwt_unix.close as well?
My understanding is that there is already a Lwt_unix.close inside Lwt_io.establish_server. The question is that when this is a Unix domain socket, do we also need to clean up the separate file system file that was created? What's the right way for a Unix server to handle a Unix domain socket? What do the various daemon managers expect it to do?
Just as a data point, the Go people are saying it's the caller's responsibility: https://github.com/golang/go/issues/70985
They quoted this page: https://man7.org/linux/man-pages/man7/unix.7.html
Binding to a socket with a filename creates a socket in the filesystem that must be deleted by the caller when it is no longer needed (using unlink(2)). The usual UNIX close-behind semantics apply; the socket can be unlinked at any time and will be finally removed from the filesystem when the last reference to it is closed.
I am interpreting 'caller' in our case as the caller of the Dream.serve function.
Maybe adding an example to the tutorial is fine for this.
@toastal did you try my suggestion? Did you have any issues with it?
the socket can be unlinked at any time
Does that mean we can proactively do the unlink on the socket immediately after establish_server returns?
~Good catch! It does seem like it.~
On second thought, that might not work. When the server is established, the server needs to wait for a 'client' to open the socket file as well. It can't just immediately unlink the socket file because that would remove its directory entry and prevent the client from accessing it.
It seems like it needs testing. Why would the socket file be removed? There is an open fd in the server process referring to it.
I guess the same reason why a file can be deleted in a Unix filesystem even though it was opened by a process? The inode still exists but it can't be accessed from the filesystem because it was unlinked. But yeah, we should double-check.
If it is the user’s responsibility, it should at least be in the docs I think as I wasn’t expecting this behavior with novice experience in this space.
At least Dream.run ought to do this cleanup internally, it's supposed to be the no-boilerplate API.
And as for Dream.serve, I think this would be decided by whether there is ever any legitimate reason to leave the file around. Can we make the decision for the user to always clean it up? Or are there reasons why a user might want to keep it around longer than the server is running?
If Dream.run uses a default stop promise that resolves on SIGTERM, agree it's simple for it to do this cleanup.
For serve, I don't know if we can say with certainty that the socket file should never outlive the server, so imho we shouldn't do the cleanup there. This should be documented of course.
When doing automatic removal we should also be sure to check what happens if the socket is an abstract Unix domain socket. I guess unlink would fail in that case. We should probably not call it to begin with if we have an abstract socket.
By the way, @toastal, would your original use case be best served by an abstract Unix domain socket, to begin with?
By the way, @toastal, would your original use case be best served by an abstract Unix domain socket, to begin with?
What is “abstract” here? This means it lives in the kernel & there is no cleanup? I still need a path of sorts, so probably needs to be a file path.
proxy.reverse.url = "http://[unix:/path/to/socket]/path"
Apparently in Linux abstract socket paths are just 'null byte + path' so eg "\000/path/to/socket". But beware that someone named 'Lennart' warns of potential security issues with them: https://utcc.utoronto.ca/~cks/space/blog/linux/SocketAbstractNamespace?showcomments#comments (for context, given the fact that the comment was on Chris Siebenmann's blog, I suspect it's Lennart Poettering, creator of systemd)
The standard solution here is to unlink the socket file right before calling bind, since the cleanup code won't run when the process isn't closed gracefully. Here's an example of syslogd calling unlink before bind. [^1]
One can try to remove the socket file when closing gracefully to avoid leaving it behind (both libuv and the Go runtime do this by default, for example). That's fine if you simply stop-start the server, but breaks when starting a new instance before closing the old one: The new instance will unlink and bind its own socket, and then the old one will unlink the new socket instead. That workflow is common to minimize downtime, so unlinking on close is usually optional (not sure if that use case matters here though).
@yawaramin I too suspect that's Poettering. And I agree on this one, if only because they don't interact with the existing (file) permissions model and you need to reimplement that yourself with SO_PEERCRED. Also they're Linux-only, whereas AF_UNIX SOCK_STREAM exists even on Windows now.
[^1]: Edited to link to the latest code, not the revision I found linked.
Good point. Unlinking immediately before starting the Dream server seems like the best option overall. And even better, Dream can just do it pretty trivially.
Seeing PR #407 deleting the file got me thinking -- do we ever need to be careful to check that the file is a socket to begin with? That is, what if there's a regular file at socket's path, and this causes us to unlink it? What's the standard practice in the wider world of network and daemon programming?
We can easily check that the file is a socket, if it exists, using Lwt_unix.stat.