old_mixer_repo icon indicating copy to clipboard operation
old_mixer_repo copied to clipboard

Attribute producing adapter based on Sets/Maps stored in CRDs

Open louiscryan opened this issue 7 years ago • 22 comments

@geeknoid @mandarjog @rshriram

The Mixer expression language is quite powerful but implementing large set membership or large keyspace maps would be extremely unwieldly. I think this would be a relatively cheap yet powerful feature to deliver in 0.2.

I propose we add:

  • A standard CRD mechanism for defining large sets and maps (with types?)
  • Standard namespace ACL controls for writing these maps/sets
  • A simple built in attribute producing adapter that uses these sets or maps to produce values keyed by other attributes in the context
  • A simple 'Predicate' Check adapter which uses a the Mixer expression language to allow or deny check requests

This functionality would provide a generic mechanism for:

  • whitelisting and blacklisting
  • looking up parameters to downstream adapters. E.g distinct quota limits
  • Defining policy maps which affect which handlers are selected based on context

E.g logical flow

CRD: name:Admins, type:Set -> ("steve", "dana", "rachel") Attribute Producing adapter: isAdmin = request.user in 'Admins' Check : allow request if isAdmin

Maybe someone in the community could pick it up to help validate the Mixer adapter architecture stuff (that someone could even be me :)

louiscryan avatar Aug 10 '17 18:08 louiscryan

@timothyhinrichs did you guys provide any in-language mechanisms for large sets/maps in OPA or was the intent to handle this out-of-band using extensions points like what I described above. Would be good to be lined up on this.

@tristonianjones largely same question based on Firebase experience ?

louiscryan avatar Aug 10 '17 18:08 louiscryan

We have part of what we need for this today: the generic list checker takes a map from its configuration and checks that the result of an expression is in its set (or not in it, for blacklists).

If we push Mixer config into CRDs (AFIAK @jmuk is working on this now) then technically we can do this as soon as Mixer CRDs land, with no other changes. If we want do something like read a configmap in the adapter as the input, then some code changes are required.

ZackButcher avatar Aug 10 '17 18:08 ZackButcher

I've wanted to create a generic map-producing Attributes adapter for a while as I think it's generally useful. Such an adapter doesn't exist today and could be used to help solve this case too.

ZackButcher avatar Aug 10 '17 18:08 ZackButcher

I think list checker might be a little too constraining as it couples the value lookup to the behavior in a limiting way. i.e. can't use a predicate expression that combines the value of the lookup with other attributes in the context.

On Thu, Aug 10, 2017 at 11:31 AM, Zack [email protected] wrote:

I've wanted to create a generic map-producing Attributes adapter for a while as I think it's generally useful. Such an adapter doesn't exist today and could be used to help solve this case too.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/istio/mixer/issues/1049#issuecomment-321635885, or mute the thread https://github.com/notifications/unsubscribe-auth/AIoKPC6FcgVZGcwsyMoJN4GojDj2Xsakks5sW0yAgaJpZM4OzzUa .

louiscryan avatar Aug 10 '17 18:08 louiscryan

OPA does more than storing sets with datalog-style reasoning. Can Mixer do a join operation across two tables?

kyessenov avatar Aug 10 '17 18:08 kyessenov

I guess one question here is whether the sets/maps are materialized into the context and treated as first-class constructs or whether we want to maintain some decoupling because we don't want to commit to materialization.

I don't know if the Mixer language will be ready in 0.2 to handle them as first-class so we may need to keep the decoupling (i.e using an attribute producing adapter). We can always revisit in 0.3 as we work on OPA alignment etc.

On Thu, Aug 10, 2017 at 11:34 AM, Kuat [email protected] wrote:

OPA does more than storing sets with datalog-style reasoning. Can Mixer do a join operation across two tables?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/istio/mixer/issues/1049#issuecomment-321636551, or mute the thread https://github.com/notifications/unsubscribe-auth/AIoKPEduG5h1LkHJa-wmy8hNVZQNTUq7ks5sW00igaJpZM4OzzUa .

louiscryan avatar Aug 10 '17 18:08 louiscryan

I think Mixer already materializes all data sources. This is really a database problem, and can be improved as we get to larger scale and need multi-level caching.

kyessenov avatar Aug 10 '17 19:08 kyessenov

Mixer 0.2 is pretty much locked down, we still have a number of things to deliver. So let's talk about this in the context of 0.3. This will give us more time to design this feature properly.

On Thu, Aug 10, 2017 at 12:47 PM, Kuat [email protected] wrote:

I think Mixer already materializes all data sources. This is really a database problem, and can be improved as we get to larger scale and need multi-level caching.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/istio/mixer/issues/1049#issuecomment-321654613, or mute the thread https://github.com/notifications/unsubscribe-auth/AVucHXOrMgHELtWep9GKh4VzKdVJM1uxks5sW15qgaJpZM4OzzUa .

geeknoid avatar Aug 10 '17 19:08 geeknoid

Happy to treat as '3rd party' extension while we work this out

louiscryan avatar Aug 10 '17 20:08 louiscryan

Apologies for the late reply. Maps and lists with a limited number of entries are typically handled within code, but for larger datasets (say > 500 entries), Firebase prefers an extension that permits policy writers to query the database for the portion of the dataset required to answer the question.

The reasoning for the extension based approach is two-fold:

  • Large datasets written inline in a policy make the policy much more difficult to manage
  • All data provided into the policy must be marshalled into a format the policy expects and marshalling data can easily dominate evaluation time.

TristonianJones avatar Aug 11 '17 15:08 TristonianJones

@louiscryan I'm happy to work on this.

I'll start with a bit of prototyping, but follow up with a design doc to circ before I go whole hog.

spikecurtis avatar Aug 11 '17 21:08 spikecurtis

@louisryan yes, you can load arbitrary attributes into OPA and then refer to them in policies/queries.

Currently the data is stored using native Go types (e.g., maps, slices, etc). This has been benchmarked on a container scheduling use case where the data set was in the 100s of MBs. We have plans to optimize the storage format in the future, e.g., with string de-duplication, better number encoding, etc.

OPA was designed from the beginning with this kind of “attribute production” use case in mind.

In OPA, you write rules that define attributes (JSON documents) that can be queried. When you query OPA, the engine evaluates the rules and produces attribute values.

For example, given the some data set…

apps = [{

  name: web
  ports: [443, 8080, 80]

}, {
  name: db
  ports: [443]

}, …]


insecure_ports = {80, 8080}


You could define an attribute production rule as follows…


# generate “is_insecure_port” attribute w/ value “true” if ...
is_insecure_port = true {

  request.dest_app = data.apps[i].name                # lookup destination app in data set
 based on incoming request
  data.apps[i].ports[_] = data.insecure_ports[_]      # check if app exposes insecure port
}

FWIW, we have demo'd logical flow you described above several times as part of Kubernetes Admission Control use cases (e.g., for allow/deny decisions, app placement, etc.)

Check out the Kubernetes docs to see how this is done for app placement in Federation: https://kubernetes.io/docs/tasks/federation/set-up-placement-policies-federation/#configuring-placement-policies-via-configmaps

/cc @TristonianJones @timothyhinrichs

torin-styra avatar Aug 14 '17 17:08 torin-styra

thanks @spikecurtis for picking up the baton

louiscryan avatar Aug 14 '17 17:08 louiscryan

We're having a discussion in the forums on the general idea of adapters reading CRDs. I am fairly strongly convinced that this would be a substantial architectural mistake.

https://groups.google.com/d/msg/istio-security/4nR-qQhtPKY/cVU9C41aBAAJ

geeknoid avatar Aug 16 '17 22:08 geeknoid

The "predicate" check adapter is an interesting case. In the 0.2 design, there will be an Action that binds instances to an handler (configured adapter), and Actions will support a CEXL predicate in their match clause. We might be able to use this mechanism directly, with a very simple adapter that just allows all instances it receives.

I guess it depends on what the default Check() result is when no actions match. @geeknoid what is the planned Mixer behavior?

spikecurtis avatar Aug 18 '17 18:08 spikecurtis

Check returns a thumbs up when no actions match.

ZackButcher avatar Aug 18 '17 18:08 ZackButcher

The config I have so far looks like this:

adapters:
  - name: mapset
    kind: attributes
    impl: mapSet
    params:
      attributes:
        - attributeName: isAdmin
          kind: set
          setName: admin
          key: user
        - attributeName: userLocation
          kind: map
          mapName: locations
          key: user
      sets:
        - name: admin
          values: ["spike", "alex", "ratan", "sammy"]
        - name: exec_team
          values: ["andy", "alex", "ratan", "christopher"]
      maps:
        - name: locations
          items:
            spike: "Seattle"
            alex: "San Francisco"
            shaun: "London"

This creates an instance of the map/set attribute generating adapter. The maps and sets used by this adapter are configured inline. In the above example, there are 2 sets, named admin and exec_team, and 1 map, named locations. It also defines attributes that will be generated based on the maps or sets. In the above example, there are 2 attributes. The first is isAdmin which checks if the user is a member of the set admin. The second is userLocation which will be the value of the location map for the key defined user. In both cases user is the input variable to the adapter. So if the input were {user: spike}, then it would output {isAdmin: true, userLocation: Seattle}.

An operator would configure Mixer to use this adapter by setting the following aspect rules:

rules:
  - aspects:
    - kind: attributes
      adapter: mapset
      params:
        input_expressions:
          user: origin.user
        attribute_bindings:
          isAdmin: isAdmin
          userLocation: userLocation

This configuration maps the attribute origin.user to the input user that is expected by the adapter, and then maps the outputs isAdmin and userLocation to attributes of the same name.

Ok, so that gets us the first part, generating new attributes based on defined maps and sets. The second part of the proposal was to use these (possibly with other attributes) in a Check() adapter that is configured with a CEXL predicate. I've defined a new abac aspect for this (but will transition to a template soon).

adapters:
  - name: default
    kind: abac
    impl: cexlPredicate
    params:
      allow: isAdmin || userLocation == "London"
rules:
  - aspects:
    - kind: abac
      params:
        attributes:
          isAdmin: isAdmin
          userLocation: userLocation

The cexlPredicate adapter takes a single parameter, allow which is a CEXL predicate. The aspect maps the attribute names evaluated in the CEXL expression to the actual attributes in the request. In this case we have preserved the names, but the operator could change the names that show up in the predicate this way.

The big question going forward is whether we want to have the definition of sets and maps live in the Adapter config, or whether it should be moved external, for example to K8s CRDs.

spikecurtis avatar Aug 25 '17 20:08 spikecurtis

Could you give a brief explanation of what this is expressing? It's a bit confusing..

geeknoid avatar Aug 25 '17 22:08 geeknoid

@geeknoid I've updated with some commentary. Hopefully it is helpful.

spikecurtis avatar Aug 25 '17 22:08 spikecurtis

Thanks, the comments are very helpful :-)

It seems likely to me that we'll want fine-grained ACLs on who can change which map/set and hence these should be kept as separate resources.

It's also worth considering whether this could be all made easier to consume if we expand the Mixer's expression evaluation code to provide set/map related functions such as contains. Basically, if we believe sets and maps such as shown here are generally useful, let's make them first class within our model. Mixer already understands STRING_MAP attributes, so extending our semantic and syntactic support to include sets and manipulation of sets and maps seems natural.

geeknoid avatar Aug 29 '17 20:08 geeknoid

Maps should be separate resources, whose ACLS are managed like any other istio resources. This is part of the original description.

I believe that providing a uniform way to store resources and connect them to Mixer EL is important and useful. This feature request fits perfectly in that. Resources are owned by the attribute producing adapter, but they are managed with all other Istio storage.

mandarjog avatar Aug 29 '17 21:08 mandarjog

I was thinking about set/map functionality in CEXL as I was working through the prototype.

CEXL supports map evaluation with literals, but does it support it with attributes today? E.g. if I have attributes:

user="spike"
location = {"spike": "Seattle", "alex": "San Francisco"}

can I write an expression

location[user] == "Seattle" 

spikecurtis avatar Aug 29 '17 23:08 spikecurtis