interprocess icon indicating copy to clipboard operation
interprocess copied to clipboard

commit 7ac601 broke sockets on linux (or at least WSL)

Open EverlastingBugstopper opened this issue 2 years ago • 3 comments

i was trying to build this crate from main to see if recent changes would fix an issue i was having, but instead i started getting this error:

error: could not write outgoing message to socket

Caused by:
    Transport endpoint is not connected (os error 107)

i did a git bisect and discovered that the behavior works on commit d2ad88cbaea9b4b3a5bfb3168b712d8a874620bf, but not on commit 7ac6013ca32b5ff128fe9fa96b29b7d540d367ed.

this should be reproducible at least on WSL, but i imagine it happens on ubuntu as well. i'm not doing anything too fancy here, pretty much just the example local socket but with some strongly typed serde messages - no tokio.

EverlastingBugstopper avatar Aug 25 '22 15:08 EverlastingBugstopper

I can't quite tell which change in 7ac6013 causes this, but I actually ran into this myself while refactoring the examples a bit and adding tests that mirror them but are fully automatic. I also happen to be using WSL (mine's Ubuntu in WSL 2.0).

I also managed to get the examples working again by introducing synchronization (client waits for the server to start using a one-shot std::sync::mpsc), but haven't yet pushed those changes. Are you starting the client after the server, or at the same time? If it's the latter, then it's certainly an extremely weird error handling bug that happens when the client fails to connect, failing to report that it couldn't find the server.

I'll investigate this as soon as I finish writing the tests (yes, I'm finally doing test-driven development here, more or less, took me long enough).

kotauskas avatar Aug 31 '22 14:08 kotauskas

ah yeah i'm doing that synchronization logic myself actually! you can see that here, so i don't think that's the issue. to be clear - this happens when trying to connect to a socket that was bound to by the same process. i don't think it happens when spinning up new processes (but since the first one won't start it's hard to confirm that).

EverlastingBugstopper avatar Aug 31 '22 17:08 EverlastingBugstopper

Alright, I tracked this one down: errors produced by the connect system call were never read into an Err variant due to a typo in checking its return value, resulting in connection failures reported as Ok, which later came up as "transport endpoint not connected" as soon as you tried to do anything with the faux-connection.

I also added a test to verify that connect failures are reported properly in case this ever regresses (which it shouldn't, really), but haven't pushed it yet since I'm still polishing the neat little testing setup I got there.

Anyway, the latest commit on the main branch, 7fa31fd, fixes this. All signs point to some sort of problem with your synchronization logic, which I suppose you can debug much more easily now. (I'll leave this open for now to close when 1.2.0 hits Crates.io.)

kotauskas avatar Sep 07 '22 18:09 kotauskas

1.2.0 is finally out, so I'm closing this one!

kotauskas avatar Nov 03 '22 14:11 kotauskas