nix
nix copied to clipboard
reimplement sendmmsg/recvmmsg with better API
Motivation is explained in #1602, new version allows to receive data without performing allocations inside the receive loop and to use received data without extra copying.
This pull request contains a breaking change to API recvmmsg
(obviously) and also affects recvmsg
- new version does not set length of control message buffer if one is passed. Later change can be avoided with a bit more copy-paste.
Fixes #1602
Would you be willing to make a similar change for
sendmmsg
?
I can but it would take some time. It would be great if https://github.com/nix-rust/nix/pull/1748 was merged before that. Currently making any changes to socket/mod.rs
is unnecessary painful with the editor configured to automatically format everything.
Do you have an example of retrofitting an existing caller to use the new API you can share?
Sort of. I updated the tests in test/sys/test_socket.rs
to use new API, nothing crazy. Just one extra parameters allocated in advance, passed into recvmmsg
and implicit collect is replaced with explicit one. Other than that I can't really imagine how would you use existing API. https://github.com/nix-rust/nix/issues/1602 contains the explanation of what's wrong with the old API and it also contains some "me too" messages, plus a few more of "me too" got removed.
I can but it would take some time. It would be great if https://github.com/nix-rust/nix/pull/1748 was merged before that. Currently making any changes to socket/mod.rs is unnecessary painful with the editor configured to automatically format everything.
Fair enough. The alternative would be to just disable the auto-formatting, but I agree that's annoying to do for one repo (when the proper fix is apparent). I've been reluctant to submit any bulk reformatting changes up until now to avoid causing merge conflicts for all outstanding PRs, but I think we've reached a cross-over point where most PRs already have merge conflicts anyways so we should just do it. @asomers thoughts?
Yeah, I think it's time to do it. I for one don't have any big outstanding PRs.
Would you be willing to make a similar change for
sendmmsg
?
Ended up implementing it. Tests seem to work. Don't have use cases in real code so can't test it properly.
@pacak can you rebase and reformat? Thanks!
Seems to work.
Something seems to be wrong here. This testcase: https://gist.github.com/rroohhh/acb6247f7634e4e68d53604c4c3f88b1 fails in debug mode with
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `[207]`,
right: `[170]`', src/bin/testcase.rs:52:9
and in release mode with
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: EFAULT', src/bin/testcase.rs:38:6
Something seems to be wrong here. This testcase: https://gist.github.com/rroohhh/acb6247f7634e4e68d53604c4c3f88b1
Interesting. Moving buffers allocation out from lines 33 and 45 fixes the problem for me, looks like something holds an invalid pointer in both cases... Thank you for a great test case, I'll see how to fix the problem tomorrow.
A good test case.
A temporary array created on lines 33/45 used to be stored inside a pointer in MultHdrs
structure, I fixed it by requiring a reference with appropriate lifetime, now the test case fails to compile.
Looks like nightly clippy introduced some new warnings. I'll look at fixing them in the next day or two.
@pacak thank you for the swift response and fix
@pacak apologies for the delay in responding to this. I've been trying to track down any potentially impacted clients to verify that this new solution fully meets their needs, but had previously been unsuccessful. Fortunately I've just found two.
@amatissart your tokio-udt
crate makes use of sendmmsg
and recvmmsg
. In the interest of helping us avoid breaking clients, would you be willing to try adopting your crate to these proposed changes?
@nullchinchilla same as above, except with the sosistab
crate and just recvmmsg
.
It'd be greatly appreciated if you're willing to help out, but there's no obligation. Thank you both either way. 😃
@pacak I tried implementing these changes for the tokio-udt
crate, but ran into issues regarding MultHdrs<_>
not implementing Send
:
help: within
[mmsghdr]
, the traitSend
is not implemented for*mut iovec
Can you try getting tokio-udt
building with your changes, and see if anything else is needed?
@pacak I tried implementing these changes for the
tokio-udt
crate, but ran into issues regardingMultHdrs<_>
not implementingSend
:help: within
[mmsghdr]
, the traitSend
is not implemented for*mut iovec
Can you try getting
tokio-udt
building with your changes, and see if anything else is needed?
Thanks for the ping, and sorry for the late response.
I made it compile on this branch: https://github.com/amatissart/tokio-udt/compare/nix-mmsg-upgrade
It seems that MultHdrs<_>
doesn't need to implement Send
(in this use case at least), as the headers have to be preallocated before each call to recvmmsg
anyway. Do I understand that correctly?
Note that "tokio-udt" is still very experimental. My usage of sendmmsg/recvmmsg might not be completely relevant in the first place.
as the headers have to be preallocated before each call to
recvmmsg
anyway. Do I understand that correctly?
The idea is that you preallocate them once then keep reusing it between calls. Preallocating them before each call gives you the behavior similar to the old one. If you want to use it with async you can either preallocate before each call or try to share the same set of buffers using mutex.
If you want to use it with async you can either preallocate before each call or try to share the same set of buffers using mutex.
Ok, that makes sense. But as pointed by @rtzoeller, using a Mutex<>
to reuse the headers does not seem to be possible, as MultHdrs<_>
is not Send
.
I'm not sure if preallocated headers can be made send - types from libc are not and the whole idea for this change is to avoid extra allocations between calls - if you care about making less system calls you want to avoid allocations as well. If you really want to use it from multiple threads you can stick to allocating and immediately using them, after this change you can consume payload from iterator instead of a vector.
Hmm... Is it not running in qemu? The test that is fails is one that checks the timestamps added by my previous pull request. I though it might not work when it's running in qemu. is aarch64 specific in some other way?
Hmm... Is it not running in qemu? The test that is fails is one that checks the timestamps added by my previous pull request. I though it might not work when it's running in qemu. is aarch64 specific in some other way?
It is not running in qemu; Cirrus has a native ARM container.
https://github.com/nix-rust/nix/blob/5252f7e1bd1361b47a655ae46a02c32d604483ee/.cirrus.yml#L145-L153
@pacak curiously, it passed on your original commit pre-bors. I'm going have bors try again (but not submit), to get a feel for how flaky this is. I'm not familiar with any aarch64 quirks here.
bors try
try
Build succeeded:
- FreeBSD 12 amd64 & i686
- Linux powerpc64le
- Redox x86_64
- Haiku x86_64
- Linux x32
- OpenBSD x86_64
- Android armv7
- iOS x86_64
- Linux arm-musleabi
- Linux i686
- Linux s390x
- DragonFly BSD x86_64
- macOS aarch64
- Android aarch64
- Linux armv7 gnueabihf
- iOS aarch64
- FreeBSD 14 amd64 & i686
- Linux mipsel
- Linux x86_64 musl
- Linux powerpc64
- Fuchsia x86_64
- Linux MIPS64 el
- Linux MIPS64
- Android x86_64
- Rust Formatter
- Illumos
- NetBSD x86_64
- Linux i686 musl
- Linux arm gnueabi
- Android arm
- Android i686
- Linux MIPS
- Linux aarch64
- Linux armv7 uclibceabihf
- Linux powerpc
- Linux x86_64
- macOS x86_64
- Minver
- Rust Stable
@asomers any ideas on this flaky test?
No, I don't have any quick guesses.
@pacak I dug out an old Raspberry Pi 3 and confirm that this failure reproduces on real hardware. I'm not sure why yet, but I'll try to look into it over the next few days.
I dug out an old Raspberry Pi 3 and confirm that this failure reproduces on real hardware. I'm not sure why yet, but I'll try to look into it over the next few days.
Thank you for the update, let me know if I can help you with anything.
@pacak a few observations; I've not had too much time to dig in:
- I can bang on
test_recvmm2
indefinitely on Linux AMD64 with no issues. - Anecdotally, there appears to be a "clumpiness" to the failures. Run in a loop until failure, I might see the test fail on the first iteration on three or four attempts in a row, and then on the next run take dozens of iterations to fail. Not sure what to make of this - perhaps an alignment issue or something similar where we get "lucky" frequently, in a streaky way?
- Here are the values of
recv
andrmsg
for the passing and failing cases on Linux aarch64. Perhaps you can make something out of them - it seems like we might be getting back data but not the control message?
Passing
[src/sys/socket/mod.rs:1827] &recv = MultiResults {
rmm: MultiHeaders {
items: [
mmsghdr {
msg_hdr: msghdr {
msg_name: 0x0000000000000000,
msg_namelen: 0,
msg_iov: 0x0000ffffa4001290,
msg_iovlen: 2,
msg_control: 0x0000ffffa40013f0,
msg_controllen: 64,
msg_flags: 0,
},
msg_len: 400,
},
mmsghdr {
msg_hdr: msghdr {
msg_name: 0x0000000000000000,
msg_namelen: 0,
msg_iov: 0x0000ffffa4001350,
msg_iovlen: 2,
msg_control: 0x0000ffffa4001430,
msg_controllen: 64,
msg_flags: 0,
},
msg_len: 0,
},
],
addresses: [
core::mem::maybe_uninit::MaybeUninit<()>,
core::mem::maybe_uninit::MaybeUninit<()>,
],
cmsg_buffers: Some(
[
64,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
37,
0,
0,
0,
11,
73,
4,
99,
0,
0,
0,
0,
84,
54,
93,
30,
0,
0,
<zeroes>,
0,
0,
],
),
msg_controllen: 64,
},
current_index: 0,
received: 1,
}
[src/sys/socket/mod.rs:1832] &rmsg = RecvMsg {
bytes: 400,
cmsghdr: Some(
cmsghdr {
cmsg_len: 64,
cmsg_level: 1,
cmsg_type: 37,
},
),
address: Some(
(),
),
flags: (empty),
iobufs: PhantomData,
mhdr: msghdr {
msg_name: 0x0000000000000000,
msg_namelen: 0,
msg_iov: 0x0000ffffa4001290,
msg_iovlen: 2,
msg_control: 0x0000ffffa40013f0,
msg_controllen: 64,
msg_flags: 0,
},
}
Failing
[src/sys/socket/mod.rs:1827] &recv = MultiResults {
rmm: MultiHeaders {
items: [
mmsghdr {
msg_hdr: msghdr {
msg_name: 0x0000000000000000,
msg_namelen: 0,
msg_iov: 0x0000ffffb4001290,
msg_iovlen: 2,
msg_control: 0x0000ffffb40013f0,
msg_controllen: 0,
msg_flags: 0,
},
msg_len: 400,
},
mmsghdr {
msg_hdr: msghdr {
msg_name: 0x0000000000000000,
msg_namelen: 0,
msg_iov: 0x0000ffffb4001350,
msg_iovlen: 2,
msg_control: 0x0000ffffb4001430,
msg_controllen: 64,
msg_flags: 0,
},
msg_len: 0,
},
],
addresses: [
core::mem::maybe_uninit::MaybeUninit<()>,
core::mem::maybe_uninit::MaybeUninit<()>,
],
cmsg_buffers: Some(
[
0,
0,
<zeroes>,
0,
0,
],
),
msg_controllen: 64,
},
current_index: 0,
received: 1,
}
[src/sys/socket/mod.rs:1832] &rmsg = RecvMsg {
bytes: 400,
cmsghdr: None,
address: Some(
(),
),
flags: (empty),
iobufs: PhantomData,
mhdr: msghdr {
msg_name: 0x0000000000000000,
msg_namelen: 0,
msg_iov: 0x0000ffffb4001290,
msg_iovlen: 2,
msg_control: 0x0000ffffb40013f0,
msg_controllen: 0,
msg_flags: 0,
},
}
Yea, looks like we are getting the data but not the control header.
Things to try in no particular order:
- Try to change the test to transmit a bunch of messages at once and see if it fails in the first iteration or it can fail in the non first one - where it should perform no allocations.
- Try to backport this test to the old API in some way. It's a new test I added, maybe there's a problem with recvmmsg on aarch64
Rebased the branch, backported the test in https://github.com/nix-rust/nix/pull/1837
@rtzoeller Can you try running both this and backported tests on your Pi?