p4-spec PSA Perhaps clarify PSA atomicity requirements with a few more examples

@antoninbas came up with this example P4 code (paraphrasing here, i.e. not worrying about syntax that passes the compiler cleanly):

// my_counter is an instance of a packet counter
@atomic {
    my_counter.count();
    my_counter.count();
}

If these were the only mentions of my_counter in a P4 program, should it be guaranteed that a control plane would only ever read even count values?

One possible answer is: yes, that is what the P4_16 and PSA specifications imply should happen, so if you have an implementation that cannot guarantee that (i.e. reading the counter from the control plane might in some cases produce an odd value), then your target's P4 compiler should reject the P4 program as not possible to implement.

Nov 15 '18 19:11 jafingerhut

I have re-read the concurrency section of the language specification. I think the current text strongly suggests that reading the counter should always return an even value, but it doesn't explicitly say that the operations should be atomic with respect to control-plane operations.

So I'd be in favor of settling this (either way) and clarifying the specification text.

Nov 16 '18 01:11 jnfoster

I think that imposing atomicity for operations that involve the control plane level is going to significantly restrict the data plane programs that can be accepted. And it is not clear to me how would one build a data plane program that can run with different NOS implementations.

I am not familiar with the production level NOS implementations, but from what I hear, folks go to extreme lengths to ensure that consistent behavior, and use various methods. Enforcing such behavior in a P4 compiler is ... hopeless?

Nov 16 '18 01:11 cc10512

See https://github.com/p4lang/p4runtime/pull/90

Nov 16 '18 02:11 jnfoster

I'm not sure I follow that example: how can we call this 'atomic'

@atomic {
        r1.write((Index_t)1, (Value_t)100);
        r1.write((Index_t)2, (Value_t)100);
}

when these behavior is allowed:

 * `r1[1] == 100` and `r1[2] == 0`
 * `r1[1] == 100` and `r1[2] == 100`

It is only the ordering of the statements then, not the atomic visibility of the two indices to an outside reader?

Nov 16 '18 02:11 cc10512

There's atomic from the perspective of the data plane (what happens to two packets being processed concurrently) vs. control plane (what happens to two control plane operations being processed as packets are being processed). In addition, the control plane has "mega" operations like reading all values from a register that add additional complications.

On Thu, Nov 15, 2018 at 9:15 PM Calin Cascaval [email protected] wrote:

I'm not sure I follow that example: how can we call this 'atomic'

@atomic { r1.write((Index_t)1, (Value_t)100); r1.write((Index_t)2, (Value_t)100); }

when these behavior is allowed:

r1[1] == 100 and r1[2] == 0

r1[1] == 100 and r1[2] == 100

It is only the ordering of the statements then, not the atomic visibility of the two indices to an outside reader?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/p4lang/p4-spec/issues/693#issuecomment-439258513, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwi0qVwG3jbKkDbeEu0uNE2LgOWT0Ieks5uvh-xgaJpZM4YiDdK .

Nov 16 '18 02:11 jnfoster

As far as I am aware from the public P4_16, PSA, and P4Runtime API specifications today, while there are "batch" or "mega" operations that can be invoked from a controller via the P4Runtime API that can access many entries in a table, or many entries in a register array, the PSA says that these only need be atomic relative to data plane operations on a per-individual-operation basis, not on the batch as a whole. At least when I was writing the parts of the PSA spec on this, I was assuming that while individual entry reads/writes should be atomic between data plane and control plane, anything larger than that should not be required to be atomic between them, because that just sounds impractically complex to implement (exception: if you stop all data packet processing for long enough, it is trivially easy :-)

I have not yet read through the code samples in p4lang/p4runtime#90, but will soon-ish.

Nov 16 '18 02:11 jafingerhut

@cc10512 I have modified that example a little bit, but Nate and Andy have the correct interpretation. P4Runtime has wildcard operations (e.g. to read an entire register). Conceptually these wildcard operations can be mapped to a set of individual read operations. The individual reads are atomic relative to packet processing but not the wildcard read (or "mega" read) as a whole. It is very similar to a P4Runtime batch of individual reads and I would say the atomicity guarantees are the same.

Nov 16 '18 22:11 antoninbas

I have now read through Antonin's updated example at https://github.com/p4lang/p4runtime/pull/90 and with his most recent commit, the text seems pretty clear that multi-table-entry or multi-register-array-entry operations are atomic relative to the data plane on a per-entry basis. That seems like something much easier to implement than any stronger requirements.

If a control plane writer wants something stronger, they can stop traffic, or use "atomic pointer flipping" techniques that are pretty well known in the industry, e.g. [1].

As a side detail, I give that reference [1] not because it is when these techniques were first known to anybody, but because it is a public reference I can point out to everyone, and is clearly written and explained, rather than more-cryptic and proprietary older stuff. Thanks for publishing it!

[1] Pavol Cerny, Nate Foster, Nilesh Jagnik, and Jedidiah McClurg, “Consistent Network Updates in Polynomial Time”. International Symposium on Distributed Computing (DISC), Paris, France, September 2016.

Nov 16 '18 22:11 jafingerhut

Wow, that's so cool! We had no idea this paper was re-inventing something well known in industry. This was Nilesh's undergrad research project, advised by Pavol and Jed.

-N

On Fri, Nov 16, 2018 at 5:22 PM Andy Fingerhut [email protected] wrote:

I have now read through Antonin's updated example at p4lang/p4runtime#90 https://github.com/p4lang/p4runtime/pull/90 and with his most recent commit, the text seems pretty clear that multi-table-entry or multi-register-array-entry operations are atomic relative to the data plane on a per-entry basis. That seems like something much easier to implement than any stronger requirements.

If a control plane writer wants something stronger, they can stop traffic, or use "atomic pointer flipping" techniques that are pretty well known in the industry, e.g. [1].

As a side detail, I give that reference [1] not because it is when these techniques were first known to anybody, but because it is a public reference I can point out to everyone, and is clearly written and explained, rather than more-cryptic and proprietary older stuff. Thanks for publishing it!

[1] Pavol Cerny, Nate Foster, Nilesh Jagnik, and Jedidiah McClurg, “Consistent Network Updates in Polynomial Time”. International Symposium on Distributed Computing (DISC), Paris, France, September 2016.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/p4lang/p4-spec/issues/693#issuecomment-439547394, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwi0uXhyLXJKI-SUbaVLxE_hkAqchZgks5uvzqwgaJpZM4YiDdK .

Nov 17 '18 03:11 jnfoster

I should clarify a bit -- I have seen the technique used within a single device between tables, to make the table updates appear atomic. I had maybe only once or twice seen it proposed across multiple devices in a network, but the idea is pretty much the same in either case.

Nov 17 '18 03:11 jafingerhut