interprocess
interprocess copied to clipboard
commit 7ac601 broke sockets on linux (or at least WSL)
i was trying to build this crate from main
to see if recent changes would fix an issue i was having, but instead i started getting this error:
error: could not write outgoing message to socket
Caused by:
Transport endpoint is not connected (os error 107)
i did a git bisect and discovered that the behavior works on commit d2ad88cbaea9b4b3a5bfb3168b712d8a874620bf, but not on commit 7ac6013ca32b5ff128fe9fa96b29b7d540d367ed.
this should be reproducible at least on WSL, but i imagine it happens on ubuntu as well. i'm not doing anything too fancy here, pretty much just the example local socket but with some strongly typed serde messages - no tokio.
I can't quite tell which change in 7ac6013
causes this, but I actually ran into this myself while refactoring the examples a bit and adding tests that mirror them but are fully automatic. I also happen to be using WSL (mine's Ubuntu in WSL 2.0).
I also managed to get the examples working again by introducing synchronization (client waits for the server to start using a one-shot std::sync::mpsc
), but haven't yet pushed those changes. Are you starting the client after the server, or at the same time? If it's the latter, then it's certainly an extremely weird error handling bug that happens when the client fails to connect, failing to report that it couldn't find the server.
I'll investigate this as soon as I finish writing the tests (yes, I'm finally doing test-driven development here, more or less, took me long enough).
ah yeah i'm doing that synchronization logic myself actually! you can see that here, so i don't think that's the issue. to be clear - this happens when trying to connect to a socket that was bound to by the same process. i don't think it happens when spinning up new processes (but since the first one won't start it's hard to confirm that).
Alright, I tracked this one down: errors produced by the connect
system call were never read into an Err
variant due to a typo in checking its return value, resulting in connection failures reported as Ok
, which later came up as "transport endpoint not connected" as soon as you tried to do anything with the faux-connection.
I also added a test to verify that connect failures are reported properly in case this ever regresses (which it shouldn't, really), but haven't pushed it yet since I'm still polishing the neat little testing setup I got there.
Anyway, the latest commit on the main branch, 7fa31fd
, fixes this. All signs point to some sort of problem with your synchronization logic, which I suppose you can debug much more easily now. (I'll leave this open for now to close when 1.2.0 hits Crates.io.)
1.2.0 is finally out, so I'm closing this one!