zenoh Update ACL config format to support AND/OR logic between subjects

This PR reworks the ACL config to support boolean combinations of subjects. An examples of the new config format is the following:

{
  "access_control":
    {
      "enabled": true,
      "default_permission": "deny",

      "rules":
      [
        {
          "id": "allow pub/sub ingress on test/demo",
          "permission": "allow",
          "flows": ["ingress"],
          "actions": [
            "put",
            "declare_subscriber"
          ],
          "key_exprs": [
            "test/demo"
          ]
        },
        {
          "id": "allow get/queryable test/demo ingress/egress",
          "permission": "allow",
          "flows": ["ingress", "egress"],
          "actions": [
            "get",
            "declare_queryable"
          ],
          "key_exprs": [
            "test/demo"
          ]
        },
      ],

      "subjects":
      [
        {
          "id": "client1 or client2 on domain1 through en0/en1",
          "interfaces": [
            "en0",
            "en1",
          ],
          "cert_common_names": [
            "domain1.local"
          ],
          "usernames": [
            "client1name",
            "client2name"
          ]
        },
        {
          "id": "domain2",
          "cert_common_names": [
            "domain2.local"
          ]
        },
        {
          "id": "all interfaces"
          // no fields defined = wildcard (all interfaces)
        }
      ],

      "policy":
      [
        {
          "rules": ["allow pub/sub ingress on test/demo"],
          "subjects": [
            "client1 or client2 on domain1 through en0/en1",
            "domain2"
          ]
        },
        {
          "rules": ["allow get/queryable test/demo ingress/egress"],
          "subjects": [
            "all interfaces",
          ]
        },
      ]
    }
  }

Within each subject, a cartesian product is performed to produce the (interface, cert_common_name, username) combinations. Each combination is a logical AND between its components, and different combinations within the same subject in the subjects list represent a logical OR between them.

Rules are declared seperately, and applied to these subject logical combinations in the policy list. Unique identifiers (id fields) are used to represent the subjects and rules in the policy entries.

Jun 27 '24 17:06 oteffahi

@Mallets This is ready for review.

Jun 27 '24 17:06 oteffahi

I have got a question, what if we do not know a priori the interfaces? can an empty list or setting them to none apply the configuration on all interfaces?

Jul 17 '24 07:07 gabrik

Another question, I see that we can specify common names, so we can imagine that a user can create certificates for its clients with whathever common name, but seems that the router refuses to open the mTLS session in that case.

Can not accept TLS connection: received fatal alert: BadCertificate
2024-07-17T07:08:04.592149Z  WARN acc-0 ThreadId(36) zenoh_link_quic::unicast: QUIC acceptor failed: ConnectionClosed(ConnectionClose { error_code: Code::crypto(2a), frame_type: None, reason: b"invalid peer certificate: NotValidForName" }) at io/zenoh-links/zenoh-link-quic/src/unicast.rs:377.
2024-07-17T07:08:04.592192Z  WARN acc-0 ThreadId(36) zenoh_link_quic::unicast: QUIC acceptor failed: ConnectionClosed(ConnectionClose { error_code: Code::crypto(2a), frame_type: None, reason: b"invalid peer certificate: NotValidForName" }) at io/zenoh-links/zenoh-link-quic/src/unicast.rs:377. Hint: increase the system open file limit.

Jul 17 '24 07:07 gabrik

Another question, I see that we can specify common names, so we can imagine that a user can create certificates for its clients with whathever common name, but seems that the router refuses to open the mTLS session in that case.

Can not accept TLS connection: received fatal alert: BadCertificate
2024-07-17T07:08:04.592149Z  WARN acc-0 ThreadId(36) zenoh_link_quic::unicast: QUIC acceptor failed: ConnectionClosed(ConnectionClose { error_code: Code::crypto(2a), frame_type: None, reason: b"invalid peer certificate: NotValidForName" }) at io/zenoh-links/zenoh-link-quic/src/unicast.rs:377.
2024-07-17T07:08:04.592192Z  WARN acc-0 ThreadId(36) zenoh_link_quic::unicast: QUIC acceptor failed: ConnectionClosed(ConnectionClose { error_code: Code::crypto(2a), frame_type: None, reason: b"invalid peer certificate: NotValidForName" }) at io/zenoh-links/zenoh-link-quic/src/unicast.rs:377. Hint: increase the system open file limit.

As far as I recall mTLS is not supported in QUIC

Jul 17 '24 07:07 Mallets

As far as I recall mTLS is not supported in QUIC

It is: https://github.com/eclipse-zenoh/zenoh/pull/899

Jul 17 '24 07:07 gabrik

@fuzzypixelz could you please explain why you rewrote all the subject combinations logic? The previous logic worked and supported adding any new Subject type with very minor changes, and relied on a trie to avoid a sequential search within a list, which is the case of your newer implementation.

Jul 19 '24 09:07 oteffahi

@fuzzypixelz could you please explain why you rewrote all the subject combinations logic? The previous logic worked and supported adding any new Subject type with very minor changes, and relied on a trie to avoid a sequential search within a list, which is the case of your newer implementation.

When my review was requested I noticed that subject.binary_search(s).is_err() which led to me find that:

If a subject specifies a “valid” interface and an “invalid” certification common name, then the subject is incorrectly authenticated with the “invalid” certification common name, even though it mismatches its actual common name. Same thing happens with “valid” certification common name and an “invalid” username or a “valid” interface and an “invalid” username.
If no subject property is defined (i.e. username, certificate common name or interface), then the result is a wildcard on all interfaces. This should arguably be a wildcard on all subject properties, not just interfaces.
If at least one, but not all subject properties are defined, the result is not a wildcard on the undefined subject properties. There should arguably be a wildcard on all missing subject properties to be more consistent.

I rewrote it to ensure that (1) doesn't occur, and ended up addressing (2) and (3) also to have a generic handling of wildcards. I believed that the root issue was the Trie data structure (it searches ordered subject properties by prefix). Perhaps the typo in the binary search call would've addressed this, but I'm not sure.

It was also not clear to me that it was necessarily faster (subject properties were sorted, one prefix search was performed for each subject property, possibly a binary search for each queried subject property and relative cache unfriendliness of the Trie).

Ultimately I had ensure the implementation is sound in a timely manner so I wrote the simplest code I could think of. Needless to say, it is not up to me to decide on the final implementation.

Jul 19 '24 12:07 fuzzypixelz

I just tested it a bit and now it works as expected, I just have one minor comment, the current implementation is always an AND in the subject, it would be interesting to configure the behavior between AND and OR?

Jul 22 '24 06:07 gabrik

Also, queryable and get actions are not clear, like a queryable must have the permissions to receive get and send reply, in the current implementation there is no such reply action.

I would expect rules like (pseudocode, all allows), from the PoV of a router

queryable: 
   - declare_queryable: ingress
   - get: egress
   - reply: ingress
getter:
  - declare_queryable: egress
  - get: ingress
  - reply: egress

in current implementation we have

queryable:
 - declare_queryable: ingress
 - get: egress
 getter:
  - declare_queryable: egress
  - get: ingress

Jul 22 '24 08:07 gabrik

I just tested it a bit and now it works as expected, I just have one minor comment, the current implementation is always an AND in the subject, it would be interesting to configure the behavior between AND and OR?

@gabrik OR is expressed in the policy array by having one rule apply to multiple subjects that are constructed in a way to express a logical OR.

Jul 22 '24 09:07 oteffahi

It is an array of policy , so policies

In SOTA, the access control policy (singular) is the set of rules applied to subjects, so calling it policies would be confusing in that regard. The whole configuration creates the ACL policy. EDIT: policies will be added to align the naming in the config to all plural.

Jul 22 '24 09:07 oteffahi

Also, queryable and get actions are not clear, like a queryable must have the permissions to receive get and send reply, in the current implementation there is no such reply action.

I would expect rules like (pseudocode, all allows), from the PoV of a router
queryable: 
   - declare_queryable: ingress
   - get: egress
   - reply: ingress
getter:
  - declare_queryable: egress
  - get: ingress
  - reply: egress 
in current implementation we have
queryable:
 - declare_queryable: ingress
 - get: egress
 getter:
  - declare_queryable: egress
  - get: ingress

@gabrik Reply actions were excluded from the initial ACL design because we assumed that allowing/denying a get means allowing/denying its associated Reply even if it is not on the same key_expr. This can be changed in a future PR if deemed useful for certain applications. @OlivierHecart what do you think about this? EDIT: Reply messages will be added to ACL in a future PR, following the change of perspective from actions to messages.

Jul 22 '24 09:07 oteffahi

@gabrik Reply actions were excluded from the initial ACL design because we assumed that allowing/denying a get means allowing/denying its associated Reply even if it is not on the same key_expr

I'm against implicit things, because if that applies for get then also a declare_subscribe should imply the put

But I'm fine that to have this behavior for 1.0.0 and change for the next one.

Jul 22 '24 15:07 gabrik

I would rename "declare_subscriber" to "subscriber_declaration", "declare_queryable" to "queryable_declaration", "put" to "publication" and "get" to "query".

The idea is to use nouns instead of verbs now that we think of these as data and not actions.

This is more a matter of opinion so overall this LGTM 🚀

If we move to messages than the messages are declare_subscriber, declare_queryable, put, and get. Those messages have the same name of the exposed API used by applications, making it quite easy to correlate messages to API for the user.

Jul 23 '24 09:07 Mallets

I would rename "declare_subscriber" to "subscriber_declaration", "declare_queryable" to "queryable_declaration", "put" to "publication" and "get" to "query". The idea is to use nouns instead of verbs now that we think of these as data and not actions. This is more a matter of opinion so overall this LGTM 🚀

If we move to messages than the messages are declare_subscriber, declare_queryable, put, and get. Those messages have the same name of the exposed API used by applications, making it quite easy to correlate messages to API for the user.

@Mallets If we're talking about the API names then let's not forget that Publisher::put returns a Publication. That a queryable receives Query objects (we don't say it receives a "get"). That Subscriber::undeclare() returns a SubscriberUndeclaration (there is no SubscriberDeclaration type but that's more of an inconsistency in the API naming convention).

It is not enough for the message names to reflect the API of the subject impacted by ingress rules. Subjects impacted by egress rules also ought to be considered.

If I was a user it would be immediately clear to me that a rule with on message get would impact my Session:get but I would very surprised to learn that it also impacts my Queryable::recv_async.

The message names cannot match the API calls of subjects on both ingress and egress sides. If they match either one, then the naming is misleading for the other. Thus the names ought to be abstracted over both.

Jul 23 '24 09:07 fuzzypixelz

I would rename "declare_subscriber" to "subscriber_declaration", "declare_queryable" to "queryable_declaration", "put" to "publication" and "get" to "query". The idea is to use nouns instead of verbs now that we think of these as data and not actions. This is more a matter of opinion so overall this LGTM 🚀

If we move to messages than the messages are declare_subscriber, declare_queryable, put, and get. Those messages have the same name of the exposed API used by applications, making it quite easy to correlate messages to API for the user.

@Mallets If we're talking about the API names then let's not forget that Publisher::put returns a Publication. That a queryable receives Query objects (we don't say it receives a "get"). That Subscriber::undeclare() returns a SubscriberUndeclaration (there is no SubscriberDeclaration type but that's more of an inconsistency in the API naming convention).

It is not enough for the message names to reflect the API of the subject impacted by ingress rules. Subjects impacted by egress rules also ought to be considered.

If I was a user it would be immediately clear to me that a rule with on message get would impact my Session:get but I would very surprised to learn that it also impacts my Queryable::recv_async.

The message names cannot match the API calls of subjects on both ingress and egress sides. If they match either one, then the naming is misleading for the other. Thus the names ought to be abstracted over both.

The mapping has nothing to do with Rust types but with the Zenoh API and semantics. Here we are defining a config that can be loaded by a C/C++/Python/Java/Kotlin application. All those languages consistently define declare_subscriber, declare_queryable, etc. operations. So, IMHO restricting the scope to Rust type and driving the discussion around that would be a mistake.

Jul 23 '24 10:07 Mallets

with reserve to improve the semantic in the future.

i.e, split between get and query

You mean get and reply?

Jul 23 '24 10:07 oteffahi

with reserve to improve the semantic in the future. i.e, split between get and query

You mean get and reply?

Yup

Jul 23 '24 10:07 gabrik

The mapping has nothing to do with Rust types but with the Zenoh API and semantics.

@Mallets My argument wasn't centered around the Rust API names, I was just giving examples. Consider the following ACL rule:

{
  "id": "deny-get-egress",
  "flows": ["egress"],
  "messages": ["get"],
  "permission": "deny",
  "key_exprs": ["test/1"]
}

If deny-get-egress is applied to subject S in a router R for instance, then R would block any queries matching test/1 from reaching S.

What are the semantics of denying "get" messages[^1] from reaching S? The word "get" refers to the Zenoh API operation get using your wording.

Here we are trying to configure whether or not queries can reach S, the semantics are more clearly expressed in terms of data in this case, not API operations. Thus "operations" do not capture the semantics of messages in all cases (granted they do for ingress flows).

[^1]: If we're being pedantic here, there is not such thing as a "get" message, it is an operation.

Jul 23 '24 12:07 fuzzypixelz

zenoh zenoh copied to clipboard

Update ACL config format to support AND/OR logic between subjects

zenoh
zenoh copied to clipboard