netlink
netlink copied to clipboard
Support batch messages
This is motivated by netfilter; changes to netfilter must be done through a series of messages (started by NFNL_MSG_BATCH_BEGIN
, ended by NFNL_MSG_BATCH_END
). The full batch needs to be submitted to the kernel in one write/sendto/..., otherwise the kernel will abort the batch.
@stbuehler Thanks for the patch. Could you provide test or example code helping me understand how this works in upper level?
Hi. I'm working on some netfilter code; I don't think it's ready to show yet.
But here is a small example. netfilter changes need to be done in a "batch" request; it starts with a BatchBegin
and ends with a BatchEnd
message. The kernel wants to read the end message in the same "buffer" as the begin message (I suppose so userspace can't block the "lock" held during a change). The batch is also locked to a certain netfilter subsystem.
let mut batch = handle.batch(SocketAddr::new(0, 0));
batch.notify(NetfilterMessage {
header: NetfilterHeader {
resource_id: (libc::NFNL_SUBSYS_NFTABLES as u16) * 256,
..Default::default()
},
body: NetfilterBody::BatchBegin,
}.request_no_ack());
let c_table_response = batch.request(NetfilterMessage {
header: NetfilterHeader {
family: Family::INET,
..Default::default()
},
body: NetfilterBody::NewTable(NftTable {
name: Some(std::ffi::OsString::from("filter")),
flags: Some(0),
..Default::default()
}),
}.create());
let c_table_set = batch.request(NetfilterMessage {
header: NetfilterHeader {
family: Family::INET,
..Default::default()
},
body: NetfilterBody::NewSet(NftSet {
table: Some(std::ffi::OsString::from("filter")),
name: Some(std::ffi::OsString::from("allow")),
key_type: Some(9), // "ether_addr" in userspace nft; kernel doesn't care
key_len: Some(6),
id: Some(1),
flags: Some(libc::NFT_SET_TIMEOUT as u32),
// userdata: b"\x00\x04\x02\x00\x00\x00",
..Default::default()
}),
}.create());
batch.notify(NetfilterMessage {
header: NetfilterHeader {
resource_id: (libc::NFNL_SUBSYS_NFTABLES as u16) * 256,
..Default::default()
},
body: NetfilterBody::BatchEnd,
}.request_no_ack());
batch.send()?;
FWIW this is a feature I'd love to see supported. Thanks for working on this @stbuehler :)
Hey @stbuehler do you think you'll pursue this PR now that #195 is in? I'd love to see batch messages landing so I might pick it up next week if you're not planning to come back to it.
Hey @stbuehler do you think you'll pursue this PR now that #195 is in? I'd love to see batch messages landing so I might pick it up next week if you're not planning to come back to it.
Yeah sorry, got... distracted. Quite late here, so I probably should take another look tomorrow, but let's see how this is looking now. (Also should test whether my netfilter code still works with this.)
First of all: sorry, didn't work on this in a long time. I wasn't satisfied with how I split the changes into single commits, and I think this is looking better now.
Sadly I discovered a new design issue (I was working on similar things in python the last two weeks, which made me think about it again):
I think the handling of pending_requests
seems dangerous (#138): basically all messages to the kernel require NLM_F_REQUEST
, and not adding NLM_F_ACK
means the pending_requests
will stay around forever unless the request either returns an error or a "done" response (after possibly other messages). Imho this means you should always set NLM_F_ACK
and hope the kernel honors the flag, just to make sure pending_requests
is cleaned up.
Now regarding batch messages and pending_requests
:
- The real bad news is: neither
NFNL_MSG_BATCH_BEGIN
norNFNL_MSG_BATCH_END
respectNLM_F_ACK
- i.e. the kernel will never send a message on success. A successful empty batch (i.e. just begin+end) returns no message at all... (and if the batch was aborted due to a failure of an inner message, the begin/end will still not fail...) - On the other hand even a
NFNL_MSG_BATCH_END
message can fail with an error (for example if it is missing theNLM_F_REQUEST
), andNFNL_MSG_BATCH_BEGIN
can fail due to lots of problems. - Some errors on single messages won't abort the full batch, but rather collect all errors (and ACKs) in all messages. But some errors (afaict: all BEGIN failures, OOM, ...) will abort the batch, which means there won't be any ACKs.
- Afaict an ACK on a single message in the batch won't mean the batch actually got committed
Not sure how to handle this in a sane way. I think it will at least require a manual way to remove entries from pending_requests
once we know they won't receive any messages anymore. And perhaps a flag whether a "response" message was the last one in a datagram - because I think we will receive all errors/acks for a batch in a single datagram; once we saw the last response in a datagram that matches a batch, we know we got all responses ("implicit ack").
Another way could be using the same sequence number for all messages in a batch, and collect all responses for it from a single datagram before dropping the pending_requests
entry (instead of dropping it on the first ack). It would be a challenge to map errors to single messages of the batch, but otoh that is probably not important anyway - the batch should either succeed or fail completely, at least in a first iteration of the feature :)
Just checked and it doesn't seem like we get the errors in one go; so that is not an option. Next idea: use a dummy request (with a simple response) after a batch: as the kernel handles the messages in order the response to the dummy request marks the end of all errors for the batch. Still need a way to stop waiting for sequence ids so we can detect problems with the BATCH_END message.
@stbuehler This PR in draft mode for a while. Is it ready for review?