netlink icon indicating copy to clipboard operation
netlink copied to clipboard

Support batch messages

Open stbuehler opened this issue 3 years ago • 8 comments

This is motivated by netfilter; changes to netfilter must be done through a series of messages (started by NFNL_MSG_BATCH_BEGIN, ended by NFNL_MSG_BATCH_END). The full batch needs to be submitted to the kernel in one write/sendto/..., otherwise the kernel will abort the batch.

stbuehler avatar Jun 06 '21 16:06 stbuehler

@stbuehler Thanks for the patch. Could you provide test or example code helping me understand how this works in upper level?

cathay4t avatar Jun 14 '21 00:06 cathay4t

Hi. I'm working on some netfilter code; I don't think it's ready to show yet.

But here is a small example. netfilter changes need to be done in a "batch" request; it starts with a BatchBegin and ends with a BatchEnd message. The kernel wants to read the end message in the same "buffer" as the begin message (I suppose so userspace can't block the "lock" held during a change). The batch is also locked to a certain netfilter subsystem.

let mut batch = handle.batch(SocketAddr::new(0, 0));
batch.notify(NetfilterMessage {
	header: NetfilterHeader {
		resource_id: (libc::NFNL_SUBSYS_NFTABLES as u16) * 256,
		..Default::default()
	},
	body: NetfilterBody::BatchBegin,
}.request_no_ack());
let c_table_response = batch.request(NetfilterMessage {
	header: NetfilterHeader {
		family: Family::INET,
		..Default::default()
	},
	body: NetfilterBody::NewTable(NftTable {
		name: Some(std::ffi::OsString::from("filter")),
		flags: Some(0),
		..Default::default()
	}),
}.create());
let c_table_set = batch.request(NetfilterMessage {
	header: NetfilterHeader {
		family: Family::INET,
		..Default::default()
	},
	body: NetfilterBody::NewSet(NftSet {
		table: Some(std::ffi::OsString::from("filter")),
		name: Some(std::ffi::OsString::from("allow")),
		key_type: Some(9), // "ether_addr" in userspace nft; kernel doesn't care
		key_len: Some(6),
		id: Some(1),
		flags: Some(libc::NFT_SET_TIMEOUT as u32),
		// userdata: b"\x00\x04\x02\x00\x00\x00",
		..Default::default()
	}),
}.create());
batch.notify(NetfilterMessage {
	header: NetfilterHeader {
		resource_id: (libc::NFNL_SUBSYS_NFTABLES as u16) * 256,
		..Default::default()
	},
	body: NetfilterBody::BatchEnd,
}.request_no_ack());
batch.send()?;

stbuehler avatar Jun 14 '21 21:06 stbuehler

FWIW this is a feature I'd love to see supported. Thanks for working on this @stbuehler :)

little-dude avatar Nov 08 '21 09:11 little-dude

Hey @stbuehler do you think you'll pursue this PR now that #195 is in? I'd love to see batch messages landing so I might pick it up next week if you're not planning to come back to it.

little-dude avatar Nov 25 '21 19:11 little-dude

Hey @stbuehler do you think you'll pursue this PR now that #195 is in? I'd love to see batch messages landing so I might pick it up next week if you're not planning to come back to it.

Yeah sorry, got... distracted. Quite late here, so I probably should take another look tomorrow, but let's see how this is looking now. (Also should test whether my netfilter code still works with this.)

stbuehler avatar Nov 25 '21 23:11 stbuehler

First of all: sorry, didn't work on this in a long time. I wasn't satisfied with how I split the changes into single commits, and I think this is looking better now.

Sadly I discovered a new design issue (I was working on similar things in python the last two weeks, which made me think about it again): I think the handling of pending_requests seems dangerous (#138): basically all messages to the kernel require NLM_F_REQUEST, and not adding NLM_F_ACK means the pending_requests will stay around forever unless the request either returns an error or a "done" response (after possibly other messages). Imho this means you should always set NLM_F_ACK and hope the kernel honors the flag, just to make sure pending_requests is cleaned up.

Now regarding batch messages and pending_requests:

  • The real bad news is: neither NFNL_MSG_BATCH_BEGIN nor NFNL_MSG_BATCH_END respect NLM_F_ACK - i.e. the kernel will never send a message on success. A successful empty batch (i.e. just begin+end) returns no message at all... (and if the batch was aborted due to a failure of an inner message, the begin/end will still not fail...)
  • On the other hand even a NFNL_MSG_BATCH_END message can fail with an error (for example if it is missing the NLM_F_REQUEST), and NFNL_MSG_BATCH_BEGIN can fail due to lots of problems.
  • Some errors on single messages won't abort the full batch, but rather collect all errors (and ACKs) in all messages. But some errors (afaict: all BEGIN failures, OOM, ...) will abort the batch, which means there won't be any ACKs.
  • Afaict an ACK on a single message in the batch won't mean the batch actually got committed

Not sure how to handle this in a sane way. I think it will at least require a manual way to remove entries from pending_requests once we know they won't receive any messages anymore. And perhaps a flag whether a "response" message was the last one in a datagram - because I think we will receive all errors/acks for a batch in a single datagram; once we saw the last response in a datagram that matches a batch, we know we got all responses ("implicit ack").

Another way could be using the same sequence number for all messages in a batch, and collect all responses for it from a single datagram before dropping the pending_requests entry (instead of dropping it on the first ack). It would be a challenge to map errors to single messages of the batch, but otoh that is probably not important anyway - the batch should either succeed or fail completely, at least in a first iteration of the feature :)

stbuehler avatar Apr 10 '22 14:04 stbuehler

Just checked and it doesn't seem like we get the errors in one go; so that is not an option. Next idea: use a dummy request (with a simple response) after a batch: as the kernel handles the messages in order the response to the dummy request marks the end of all errors for the batch. Still need a way to stop waiting for sequence ids so we can detect problems with the BATCH_END message.

stbuehler avatar Apr 15 '22 10:04 stbuehler

@stbuehler This PR in draft mode for a while. Is it ready for review?

cathay4t avatar May 07 '22 00:05 cathay4t