p4runtime
p4runtime copied to clipboard
Reject multiple updates for the same key, if they are in the same batch
When the switch receives a batch of updates, the P4RT spec says that the switch must behave as if it executed all the updates in a particular order (which order is up to the switch). Therefore, if we have the following batch:
INSERT l3 table entry with key 192.168.1.1 MODIFY l3 table entry with key 192.168.1.1 DELETE l3 table entry with key 192.168.1.1
The switch has to return one of the following status messages: 1,2,3: OK, OK, OK 1,3,2: OK, OK, NOT_FOUND 2,1,3: NOT_FOUND, OK, OK 2,3,1: NOT_FOUND, NOT_FOUND, OK 3,1,2: NOT_FOUND, OK, OK 3,2,1: NOT_FOUND,NOT_FOUND,OK
Note that while some ordering above succeed completely or partially, sending multiple updates for the same key in one batch is always a bug in the controller, because the execution order of the updates always matters for the end result, but is also completely out of control of the controller.
If the controller is "lucky", the switch may execute the operations in the order required by the controller, but this is dangerous, because it is just hiding the bug, and any change to the switch's implementation details can resurface the bug.
To get around this problem, we propose that the switch should always reject all updates which operate on the same key. So the status for the example above would be: INVALID_ARGUMENT, INVALID_ARGUMENT, INVALID_ARGUMENT
We propose to only reject the updates that touch the same keys, so the following batch:
INSERT l3 table entry with key 192.168.1.1 INSERT l3 table entry with key 10.0.0.0 MODIFY l3 table entry with key 192.168.1.1 INSERT l3 table entry with key 10.0.0.1 DELETE l3 table entry with key 192.168.1.1
would return the result: INVALID_ARGUMENT, OK, INVALID_ARGUMENT, OK, INVALID_ARGUMENT
I'm fine with this change, but it does add some extra burden to the P4Runtime server, so would like to hear from others such as @bocon13, @ccascone
I assume that this doesn't just apply to table entries, but to all entities for which we have a notion of key (e.g. action profile members)?
Actually, this makes it easier to write a P4Runtime server in many cases. I have observed the implementation of two completely independent P4RT servers (in a real switch, not a toy example), and in both cases the current spec was increadibly hard to implement correctly for the developers, but this newly proposed scheme is easy.
Here's why: Most switches have a bunch of layers that the table entries pass through, and they are identified by their key on that path. That means it's hard/impossible to send multiple updates for the same key, so now you have to implement custom logic to not send these in parallel (because normally you want to send your updates in parallel for efficiency reasons). If you can just reject them outright at the top layer (as proposed here), then you can do that filtering quickly and easily, and then send everything in parallel.
The proposed change makes sense to me, and it shouldn't break any control plane use case. In ONOS, we already avoid sending updates for the same key in the same batch. I would agree with @stefanheule that this change simplifies the server implementation, but I defer to @pudelkoM and @bocon13 for a final comment.
The proposal and reasoning seem sensible.
I'm not sure to what degree Stratum currently is affected by this issue, as we process request updates in sequence and leave parallelism up to the underlying SDE/SDK.
I'm fine with this change, but it does add some extra burden to the P4Runtime server
Pre-rejecting updates would require some extra code and state keeping for us, but probably nothing major.
I guess there could be some pitfalls around comparing non-normalized bytestrings in keys, e.g., if the controller is insane and sends something like "\x00\x01"
vs "\x01"
for the same table key field.
I don't have an issues with this, but agree with @pudelkoM that it would require some additional development in Stratum in the common P4RT server implementation.
Regarding the bytestring issue that you describe, the spec mandates that bytestrings use the shortest string that encodes the value, which means leading 0's are invalid. We don't catch these today in Strartum, but that would be another form of validation that we should probably do (and return the appropriate errors to the client).
https://p4.org/p4runtime/spec/master/P4Runtime-Spec.html#sec-bytestrings
which means leading 0's are invalid
In my understanding RW symmetry is strongly encouraged, but not required. See Table 4. Examples of Valid Bytestring Encoding:
P4 type | Integer value | P4Runtime binary string | Read-write symmetry bit<12> | 99 (0x63) | \x00\x00\x63 | no
As the preceding examples illustrate, a P4Runtime server must accept a wide assortment of possible binary string encodings for the same integer value.
@pudelkoM good point. I guess what this means is that the P4RT server implementation would be required to:
- Pre-process entries -- transforming them into canonical bytestrings and then comparing keys to filter for duplicates
- Send a shorter list to the underlying SDK
- Integrate results from pre-processing failures and the underlying SDK responses
We don't do 1 & 3 in any form today, but I don't think it would be a significant burden to implement. As Max said, Stratum applies writes synchronously and sequentially, so we could keep track of this as local variables in the Write()
implementation.
I agree that this behavior is undefined today and explicitly disallowing this behavior seems reasonable (and doesn't really impose much of a burden).
In today's meeting, we had the following realization:
Instead of allowing the switch to execute batches in any order, we can also disallow the controller from sending batches where order matters.
The second is appealing because:
- it switch can still execute batches in any order
- controllers don't want undefined orderings of their batches (in all our experiences, this is always a bug), so rejecting those batches is helpful to the controller developer
From P4RT's perspective there are only two cases where order matters:
- a batch contains multiple updates which refer to the same key (e.g. modifying an entry that is also inserted)
- a batch contains action-profile-groups that refer to action-profile-members which are being updated in the same batch The proposal is to require rejection for all of these.
There are also ordering cases which P4RT doesn't know about:
- resource exhaustion (e.g. the order in which entries are inserted into a hash table/TCAM has implications on which ones will be rejected)
- additional ordering constraints, e.g. a switch may reject a flow which refers to a nexthop, before that nexthop is inserted The proposal is to still allow these forms of order dependencies.