nix icon indicating copy to clipboard operation
nix copied to clipboard

reimplement sendmmsg/recvmmsg with better API

Open pacak opened this issue 2 years ago • 28 comments

Motivation is explained in #1602, new version allows to receive data without performing allocations inside the receive loop and to use received data without extra copying.

This pull request contains a breaking change to API recvmmsg (obviously) and also affects recvmsg - new version does not set length of control message buffer if one is passed. Later change can be avoided with a bit more copy-paste.

Fixes #1602

pacak avatar Jun 13 '22 02:06 pacak

Would you be willing to make a similar change for sendmmsg?

I can but it would take some time. It would be great if https://github.com/nix-rust/nix/pull/1748 was merged before that. Currently making any changes to socket/mod.rs is unnecessary painful with the editor configured to automatically format everything.

Do you have an example of retrofitting an existing caller to use the new API you can share?

Sort of. I updated the tests in test/sys/test_socket.rs to use new API, nothing crazy. Just one extra parameters allocated in advance, passed into recvmmsg and implicit collect is replaced with explicit one. Other than that I can't really imagine how would you use existing API. https://github.com/nix-rust/nix/issues/1602 contains the explanation of what's wrong with the old API and it also contains some "me too" messages, plus a few more of "me too" got removed.

pacak avatar Jun 20 '22 03:06 pacak

I can but it would take some time. It would be great if https://github.com/nix-rust/nix/pull/1748 was merged before that. Currently making any changes to socket/mod.rs is unnecessary painful with the editor configured to automatically format everything.

Fair enough. The alternative would be to just disable the auto-formatting, but I agree that's annoying to do for one repo (when the proper fix is apparent). I've been reluctant to submit any bulk reformatting changes up until now to avoid causing merge conflicts for all outstanding PRs, but I think we've reached a cross-over point where most PRs already have merge conflicts anyways so we should just do it. @asomers thoughts?

rtzoeller avatar Jun 20 '22 03:06 rtzoeller

Yeah, I think it's time to do it. I for one don't have any big outstanding PRs.

asomers avatar Jun 20 '22 20:06 asomers

Would you be willing to make a similar change for sendmmsg?

Ended up implementing it. Tests seem to work. Don't have use cases in real code so can't test it properly.

pacak avatar Jun 22 '22 02:06 pacak

@pacak can you rebase and reformat? Thanks!

rtzoeller avatar Jun 24 '22 02:06 rtzoeller

Seems to work.

pacak avatar Jun 24 '22 03:06 pacak

Something seems to be wrong here. This testcase: https://gist.github.com/rroohhh/acb6247f7634e4e68d53604c4c3f88b1 fails in debug mode with

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `[207]`,
 right: `[170]`', src/bin/testcase.rs:52:9

and in release mode with

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: EFAULT', src/bin/testcase.rs:38:6

rroohhh avatar Jul 05 '22 13:07 rroohhh

Something seems to be wrong here. This testcase: https://gist.github.com/rroohhh/acb6247f7634e4e68d53604c4c3f88b1

Interesting. Moving buffers allocation out from lines 33 and 45 fixes the problem for me, looks like something holds an invalid pointer in both cases... Thank you for a great test case, I'll see how to fix the problem tomorrow.

pacak avatar Jul 05 '22 13:07 pacak

A good test case.

A temporary array created on lines 33/45 used to be stored inside a pointer in MultHdrs structure, I fixed it by requiring a reference with appropriate lifetime, now the test case fails to compile.

pacak avatar Jul 06 '22 02:07 pacak

Looks like nightly clippy introduced some new warnings. I'll look at fixing them in the next day or two.

rtzoeller avatar Jul 06 '22 12:07 rtzoeller

@pacak thank you for the swift response and fix

rroohhh avatar Jul 06 '22 18:07 rroohhh

@pacak apologies for the delay in responding to this. I've been trying to track down any potentially impacted clients to verify that this new solution fully meets their needs, but had previously been unsuccessful. Fortunately I've just found two.


@amatissart your tokio-udt crate makes use of sendmmsg and recvmmsg. In the interest of helping us avoid breaking clients, would you be willing to try adopting your crate to these proposed changes?

@nullchinchilla same as above, except with the sosistab crate and just recvmmsg.

It'd be greatly appreciated if you're willing to help out, but there's no obligation. Thank you both either way. 😃

rtzoeller avatar Jul 08 '22 05:07 rtzoeller

@pacak I tried implementing these changes for the tokio-udt crate, but ran into issues regarding MultHdrs<_> not implementing Send:

help: within [mmsghdr], the trait Send is not implemented for *mut iovec

Can you try getting tokio-udt building with your changes, and see if anything else is needed?

rtzoeller avatar Aug 04 '22 05:08 rtzoeller

@pacak I tried implementing these changes for the tokio-udt crate, but ran into issues regarding MultHdrs<_> not implementing Send:

help: within [mmsghdr], the trait Send is not implemented for *mut iovec

Can you try getting tokio-udt building with your changes, and see if anything else is needed?

Thanks for the ping, and sorry for the late response.

I made it compile on this branch: https://github.com/amatissart/tokio-udt/compare/nix-mmsg-upgrade It seems that MultHdrs<_> doesn't need to implement Send (in this use case at least), as the headers have to be preallocated before each call to recvmmsg anyway. Do I understand that correctly?

Note that "tokio-udt" is still very experimental. My usage of sendmmsg/recvmmsg might not be completely relevant in the first place.

amatissart avatar Aug 04 '22 16:08 amatissart

as the headers have to be preallocated before each call to recvmmsg anyway. Do I understand that correctly?

The idea is that you preallocate them once then keep reusing it between calls. Preallocating them before each call gives you the behavior similar to the old one. If you want to use it with async you can either preallocate before each call or try to share the same set of buffers using mutex.

pacak avatar Aug 04 '22 22:08 pacak

If you want to use it with async you can either preallocate before each call or try to share the same set of buffers using mutex.

Ok, that makes sense. But as pointed by @rtzoeller, using a Mutex<> to reuse the headers does not seem to be possible, as MultHdrs<_> is not Send.

amatissart avatar Aug 05 '22 09:08 amatissart

I'm not sure if preallocated headers can be made send - types from libc are not and the whole idea for this change is to avoid extra allocations between calls - if you care about making less system calls you want to avoid allocations as well. If you really want to use it from multiple threads you can stick to allocating and immediately using them, after this change you can consume payload from iterator instead of a vector.

pacak avatar Aug 12 '22 02:08 pacak

Build failed:

bors[bot] avatar Aug 12 '22 05:08 bors[bot]

Linux aarch64

Hmm... Is it not running in qemu? The test that is fails is one that checks the timestamps added by my previous pull request. I though it might not work when it's running in qemu. is aarch64 specific in some other way?

pacak avatar Aug 12 '22 05:08 pacak

Linux aarch64

Hmm... Is it not running in qemu? The test that is fails is one that checks the timestamps added by my previous pull request. I though it might not work when it's running in qemu. is aarch64 specific in some other way?

It is not running in qemu; Cirrus has a native ARM container.

https://github.com/nix-rust/nix/blob/5252f7e1bd1361b47a655ae46a02c32d604483ee/.cirrus.yml#L145-L153

rtzoeller avatar Aug 12 '22 05:08 rtzoeller

@pacak curiously, it passed on your original commit pre-bors. I'm going have bors try again (but not submit), to get a feel for how flaky this is. I'm not familiar with any aarch64 quirks here.

bors try

rtzoeller avatar Aug 12 '22 05:08 rtzoeller

@asomers any ideas on this flaky test?

rtzoeller avatar Aug 12 '22 23:08 rtzoeller

No, I don't have any quick guesses.

asomers avatar Aug 12 '22 23:08 asomers

@pacak I dug out an old Raspberry Pi 3 and confirm that this failure reproduces on real hardware. I'm not sure why yet, but I'll try to look into it over the next few days.

rtzoeller avatar Aug 22 '22 22:08 rtzoeller

I dug out an old Raspberry Pi 3 and confirm that this failure reproduces on real hardware. I'm not sure why yet, but I'll try to look into it over the next few days.

Thank you for the update, let me know if I can help you with anything.

pacak avatar Aug 22 '22 22:08 pacak

@pacak a few observations; I've not had too much time to dig in:

  1. I can bang on test_recvmm2 indefinitely on Linux AMD64 with no issues.
  2. Anecdotally, there appears to be a "clumpiness" to the failures. Run in a loop until failure, I might see the test fail on the first iteration on three or four attempts in a row, and then on the next run take dozens of iterations to fail. Not sure what to make of this - perhaps an alignment issue or something similar where we get "lucky" frequently, in a streaky way?
  3. Here are the values of recv and rmsg for the passing and failing cases on Linux aarch64. Perhaps you can make something out of them - it seems like we might be getting back data but not the control message?
Passing
[src/sys/socket/mod.rs:1827] &recv = MultiResults {
    rmm: MultiHeaders {
        items: [
            mmsghdr {
                msg_hdr: msghdr {
                    msg_name: 0x0000000000000000,
                    msg_namelen: 0,
                    msg_iov: 0x0000ffffa4001290,
                    msg_iovlen: 2,
                    msg_control: 0x0000ffffa40013f0,
                    msg_controllen: 64,
                    msg_flags: 0,
                },
                msg_len: 400,
            },
            mmsghdr {
                msg_hdr: msghdr {
                    msg_name: 0x0000000000000000,
                    msg_namelen: 0,
                    msg_iov: 0x0000ffffa4001350,
                    msg_iovlen: 2,
                    msg_control: 0x0000ffffa4001430,
                    msg_controllen: 64,
                    msg_flags: 0,
                },
                msg_len: 0,
            },
        ],
        addresses: [
            core::mem::maybe_uninit::MaybeUninit<()>,
            core::mem::maybe_uninit::MaybeUninit<()>,
        ],
        cmsg_buffers: Some(
            [
                64,
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                1,
                0,
                0,
                0,
                37,
                0,
                0,
                0,
                11,
                73,
                4,
                99,
                0,
                0,
                0,
                0,
                84,
                54,
                93,
                30,
                0,
                0,
				<zeroes>,
                0,
                0,
            ],
        ),
        msg_controllen: 64,
    },
    current_index: 0,
    received: 1,
}
[src/sys/socket/mod.rs:1832] &rmsg = RecvMsg {
    bytes: 400,
    cmsghdr: Some(
        cmsghdr {
            cmsg_len: 64,
            cmsg_level: 1,
            cmsg_type: 37,
        },
    ),
    address: Some(
        (),
    ),
    flags: (empty),
    iobufs: PhantomData,
    mhdr: msghdr {
        msg_name: 0x0000000000000000,
        msg_namelen: 0,
        msg_iov: 0x0000ffffa4001290,
        msg_iovlen: 2,
        msg_control: 0x0000ffffa40013f0,
        msg_controllen: 64,
        msg_flags: 0,
    },
}
Failing
[src/sys/socket/mod.rs:1827] &recv = MultiResults {
    rmm: MultiHeaders {
        items: [
            mmsghdr {
                msg_hdr: msghdr {
                    msg_name: 0x0000000000000000,
                    msg_namelen: 0,
                    msg_iov: 0x0000ffffb4001290,
                    msg_iovlen: 2,
                    msg_control: 0x0000ffffb40013f0,
                    msg_controllen: 0,
                    msg_flags: 0,
                },
                msg_len: 400,
            },
            mmsghdr {
                msg_hdr: msghdr {
                    msg_name: 0x0000000000000000,
                    msg_namelen: 0,
                    msg_iov: 0x0000ffffb4001350,
                    msg_iovlen: 2,
                    msg_control: 0x0000ffffb4001430,
                    msg_controllen: 64,
                    msg_flags: 0,
                },
                msg_len: 0,
            },
        ],
        addresses: [
            core::mem::maybe_uninit::MaybeUninit<()>,
            core::mem::maybe_uninit::MaybeUninit<()>,
        ],
        cmsg_buffers: Some(
            [
                0,
                0,
                <zeroes>,
                0,
                0,
            ],
        ),
        msg_controllen: 64,
    },
    current_index: 0,
    received: 1,
}
[src/sys/socket/mod.rs:1832] &rmsg = RecvMsg {
    bytes: 400,
    cmsghdr: None,
    address: Some(
        (),
    ),
    flags: (empty),
    iobufs: PhantomData,
    mhdr: msghdr {
        msg_name: 0x0000000000000000,
        msg_namelen: 0,
        msg_iov: 0x0000ffffb4001290,
        msg_iovlen: 2,
        msg_control: 0x0000ffffb40013f0,
        msg_controllen: 0,
        msg_flags: 0,
    },
}

rtzoeller avatar Aug 23 '22 03:08 rtzoeller

Yea, looks like we are getting the data but not the control header.

Things to try in no particular order:

  1. Try to change the test to transmit a bunch of messages at once and see if it fails in the first iteration or it can fail in the non first one - where it should perform no allocations.
  2. Try to backport this test to the old API in some way. It's a new test I added, maybe there's a problem with recvmmsg on aarch64

pacak avatar Aug 23 '22 04:08 pacak

Rebased the branch, backported the test in https://github.com/nix-rust/nix/pull/1837

pacak avatar Oct 06 '22 20:10 pacak

@rtzoeller Can you try running both this and backported tests on your Pi?

pacak avatar Oct 12 '22 22:10 pacak