dapr icon indicating copy to clipboard operation
dapr copied to clipboard

proposal: Pluggable Components Registry

Open mcandeia opened this issue 3 years ago • 17 comments

In what area(s)?

/area runtime

/area operator

/area placement

/area docs

/area test-and-release

Describe the proposal

Context

This proposal is focused on improving the relationship between pluggable components CRD and component the CRD.

The naming problem

Be backed by kubernetes is almost the same as saying that kubernetes is also a kind of component registry, also means that all available Pluggable Components CRD is part of a unique registry for dapr point of view.

Currently, we are using the metadata.name as part of the component name when used by dapr enduser. So let's take an example,

For the given pluggable component

apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
  name: kafka-pluggable
spec:
  type: outputbinding
  version: v1

this will be translated to: bindings.kafka-pluggable version v1 with only outputbindings implemented (yes you can have only input/output separately)

referencing in a component:

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: pluggable-test-topic
spec:
  type: bindings.kafka-pluggable

Note that the metadata.name is used as the component name, and the remaining fields are used for defining its type and its version, so the final component name will be ${spec.type}.${metadata.name}-${spec.version}

But now let's say we want to realase the major 2 of kafka bindings. cool, let's create a new pluggable component then

apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
  name: kafka-pluggable
spec:
  type: outputbinding
  version: v2

Did you notice the problem? Yes, the object name is the same, hence the kubernetes will treat as the same object and not allowing two majors running at same time (which is totally fine and desired when doing migrations)

The second problem is for the same reason but now, let's say we want to implement the inputbinding for the kafka pluggable component, how it should look like?

apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
  name: kafka-pluggable
spec:
  type: inputbinding
  version: v1

Yes, same problem as before, so it will endup of having few choices:

Name is everything approach

For name is everything approach we have few advantages from leveraging kubernetes features to keep uniqueness across objects.

So instead of

apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
  name: kafka-pluggable
spec:
  type: inputbinding
  version: v1

we would have something like

apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
  name: inputbinding.kafka-pluggable-v1

Disadvantages:

  1. object name restrictions (max chars 45, lot of characters are not allowed, etc)
  2. We can't validate the format when applying objects

Advantage:

  1. Leverages kubernetes name uniqueness property

Name is nothing approach (preferred)

In the name is nothing approach we don't use the metadata.name for nothing, instead we should have a name in the crd spec.

apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
  name: does-not-matter
spec:
  name: kafka-pluggable
  type: outputbinding
  version: v1

Disadvantage:

We can have duplicated components with different names, e.g:

apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
  name: does-not-matter-2
spec:
  name: kafka-pluggable
  type: outputbinding
  version: v1

For that, a simple solution would be getting first by lexicographical order for the metadata name or last-applied wins + warning log.

Other alternatives:

  1. Runtime error
  2. Validation hook

We also have problems associated with versioning e.g Proto versioning and component versioning should be made separately?

But it deserves its own issue.

mcandeia avatar Sep 21 '22 00:09 mcandeia

cc: @johnewart @artursouza @yaron2 @tmacam

mcandeia avatar Sep 21 '22 01:09 mcandeia

The second approach is much preferable to the first.

Reference fields in CRDs referencing other Kubernetes objects are a common design and I am not at all worried about the fact there can be 2 pluggable components CRDs referencing the same component. A runtime error would be the correct way to handle this scenario..

Separate question: why do we need the type field and not infer that from the referenced Component?

yaron2 avatar Sep 21 '22 03:09 yaron2

Excellent question,

We can, but it requires changing our component registry behavior or a hint in the component crd telling the runtime if such component is pluggable(I’m not a fan of this one)

This is used to select which registry such pluggable component must be added. Loading pluggable components is the same as calling

${specType}Registry.RegisterComponent(“kafka-pluggable”, newKafkaPluggable)

in other words the spec type is used to ensure that the component will be already registered before the loading components phase

mcandeia avatar Sep 21 '22 09:09 mcandeia

A runtime error would be the correct way to handle this scenario

Runtime error in this case means shutting down the sidecar?

Keep in mind that it could be out of control of dapr enduser. Considering the following scenario:

Two different helm charts being developed by two different vendors (pluggable component implementers) with same name but containing different set of components with only one intersection:

Let's say two helm charts A and B:

Helm chart A installs pubsub.redis and pubsub.kafka Helm chart B installs state.redis and pubsub.kafka

So the enduser can't use them together because of the naming conflict?

That's another solution for it,

having the ComponentRegistry CRD and removing the PluggableComponentCRD,

Terminology

  • Vendors: Component Implementers
    • A vendor has an uniqueID let's say aws
  • Registry: A list of curated components maintained by a vendor

Registry

Vendors releases components via ComponentRegistry CRD

apiVersion: dapr.io/v1alpha1
kind: ComponentRegistry
metadata:
  name: aws
spec:
  state:
    dynamo@v1:
  bindings:
    dynamo@v1:
  pubsub:
    sqs@v2:
    sqs@v1:

component name would be ${type}.${vendor}.${name}

${vendor} being unique across registries e.g aws, azure i.e pubsub.aws.sqs

for me as a developer it should look like this:

apiVersion: dapr.io/v1alpha1
kind: ComponentRegistry
metadata:
  name: mcandeia
spec:
  state:
    dynamo@v1:
  bindings:
    dynamo@v1:
  pubsub:
    sqs@v2:
    sqs@v1:

using: pubsub.mcandeia.dynamo

What does that solves?

  • This is semantically closer to what we have in the code
  • This allows us to leverage the uniqueness of the metadata name because theoretically each vendor will have its unique ID and this allows having the components namespaced by vendors (it is clearer for the user)

mcandeia avatar Sep 21 '22 12:09 mcandeia

Runtime error in this case means shutting down the sidecar?

Yes. It's a valid reason to panic.

So the enduser can't use them together because of the naming conflict?

Correct.

For improved simplicity, user experience and Dapr code maintenance, I propose we remove PluggableComponent and not use any new CRD for pluggable components. Rather, we reuse the existing Component CRD and add pluggable features to it, so a component can become pluggable. This has the benefit of:

  1. Best UX for developers and operators - no new CRDs needed
  2. Decreased maintenance for the Dapr codebase

Draft CRD

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: customStateStore
spec:
  type: pubsub.yaron
  version: v1
  metadata:
  - name: table
    value: "mytable"
  - name: customProperty
    value: "something"
  pluggable:
    enabled: true

In this case, we reuse all the existing types in the type field without providing a concrete implementation since this is a pluggable component which is not registered in the runtime.

This reuses the existing CRD and thus does not run into naming issues. In addition, any future features and corresponding fields for pluggable components are added to the pluggable section.

WDYT?

yaron2 avatar Sep 21 '22 16:09 yaron2

I like the simplicity but it seems that it blocks a lot of things in the way, for example, if we add more attributes in the pluggable component this needs to be repeated in each CRD component that points to the same pluggable component.

For now, it looks like it's okay because we don't have any attributes other than the pluggable component name, but if you add at least one, you'll have to start repeating in each CRD.

Also, we need to agree first that they are different personas when dealing with pluggable components/components, and even if it's the same person sometimes, they will have different concerns,

Does it makes sense to you ?

mcandeia avatar Sep 21 '22 16:09 mcandeia

Does it makes sense to you ?

Not very much, no :)

if we add more attributes in the pluggable component this needs to be repeated in each CRD component that points to the same pluggable component.

The proposed design above maintains there is no "pointing" between a component CRD to a pluggable component. There is a single Component definition and that can be defined as pluggable or not.

This should also address your next concerns about personas and adding fields.

yaron2 avatar Sep 21 '22 16:09 yaron2

Your proposal is fine considering the requirements now. But it is just a way to add an "inline" pluggable component, so we can move all fields from pluggable component crd to the component crd as pluggable property,

What I'm trying to consider is a world where pluggable components definitions are reusable across components, like we started to discuss in this conversation, are you considering that as a possibility?

mcandeia avatar Sep 21 '22 16:09 mcandeia

I would say if we go for "inlining" the pluggable component we could use something like:

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: customStateStore
spec:
  type: pubsub.grpc
  version: v1
  metadata:
  - name: table
    value: "mytable"
  - name: customProperty
    value: "something"

no changes required

mcandeia avatar Sep 21 '22 16:09 mcandeia

What I'm trying to consider is a world where pluggable components definitions are reusable across components

I would say this is out of scope, due to:

  1. It isn't common to share definitions between CRDs or even Kubernetes objects. This is largely outside the scope of Dapr
  2. Components today have many potential sharable definitions (credentials as an example that stands out the most). We have seen no asks to make this sharable across components. And these are definitions that every developer and/or operator interface with the most. Resiliency CRDs are another very good example of many fields that could become sharable, but the commonality of the "per resource instance, encapsulated configurations" proved itself in this design.
  3. The number of fields pluggable components will have will likely be much smaller, orders of magnitude even, then components and resiliency CRDs today
  4. If it does happen in the future that pluggable components will have dozens of properties AND most of these are sharable, then we can safely introduce a pluggable component configuration CRD that is referenced, or even put this into the Configuration CRD. But again, this is extremely unlikely.

Re: the CRD proposal you made, I think this is less explicit for users and less clear that they are looking and/or declaring a pluggable component. pluggable.enabled makes the intention clear, it is typed and can be validated via tooling. In addition, it makes room to add any fields that are reserved for pluggable definitions in the future.

yaron2 avatar Sep 21 '22 17:09 yaron2

Oh wait, just to make something clear, when I say "pluggable components definitions are reusable across components" it literally means "Reference fields in CRDs referencing other Kubernetes objects are a common design"

I mean, declaring it once and reuse by referencing the object name

mcandeia avatar Sep 21 '22 17:09 mcandeia

Oh wait, just to make something clear, when I say "pluggable components definitions are reusable across components" it literally means this "Reference fields in CRDs referencing other Kubernetes objects are a common design"

So those are actually two different things: sharable definitions vs reference fields. As an example of a common use case for reference fields, look at Kubernetes selectors as an example - where a Kubernetes service is a totally separate resource but it references another resource which is a Deployment.

In our case, you are talking about sharing common fields for the purpose of sharing them between the same object type (Component), which is what I referred to above.

Looking at our use case here, there is no reason to have a separate CRD really as pluggable components are still components, logically, and thus it makes sense to be able to declare and define pluggable features in the component.

yaron2 avatar Sep 21 '22 17:09 yaron2

Please be clear if you are against this proposal as this brings more clarity to me, because I am struggling on considering this possibility and it leads me to different alternatives. If we do not consider this possibility, I am 100% in favor of your proposal. But if we consider I would like to see a world where I can say that the

pubsub.redis-pluggable-v1 points to the container: ghcr.io/redis/[email protected] and pubsub.redis-pluggable-v2 points to the container ghcr.io/redis/[email protected]

which contains other properties like env vars, volumes, resource limits, that, from my perspective, should not be a user concern, hence it is redis specific. So the component CRD should not contain those specifics fields.

It's totally fine to be against it, but I need to know so that design decisions are made in a way that either allow that future or disregards that future, or at least, exercises how the now connects with that future

mcandeia avatar Sep 21 '22 18:09 mcandeia

The only thing I'm favoring less is the ComponentRegistry design. I can get behind a PluggableComponent CRD with runtime validation of duplicate names.

However, I'm proposing an alternative design that, in my opinion, offers a better UX for developers, operators and less maintenance burden for Dapr maintainers.

which contains other properties like env vars, volumes, resource limits, that, from my perspective, should not be a user concern, hence it is redis specific. So the component CRD should not contain those specifics fields.

I would argue that Component and PluggableComponent are targeting the same personas, and having the mentioned fields in the Component CRD makes sense when considering the PoV of both personas.

Developer

In a dev test environment, developers need full autonomy to define tthe container image, volumes etc. Having this in the Component serves that purpose and enables a dev/test scenario with minimum additional overhead.

In a production environment, developers will not be authorized access to components, and thus will not be able to define said fields

Operator

As an operator defining a component in any environment, specifically production, they will have access to define the component including the packaged code (container image, volumes etc.)

What do you think? cc @artursouza

yaron2 avatar Sep 21 '22 19:09 yaron2

Now we are talking,

I’ll wait for artur opinion to have one more perspective here.

mcandeia avatar Sep 21 '22 19:09 mcandeia

After a good night of sleep I think we can get back to this discussion,

So I would like to split the discussion in two, 1) what we are going to do targeting 1.9, and 2) what is the state of art for pluggable components - desired future

for 1) due to the tightly deadline, I would argue that should keep the Pluggable Component CRD and add name as a spec attr rather than using from metadata.name, are you ok with it @yaron2 ?

mcandeia avatar Sep 22 '22 13:09 mcandeia

I don't have any strong opinion other than making things clear and easy to use -- in that regard I think that using only the Component CRD with some metadata doesn't quite communicate that fact that a pluggable component is adding itself to the registry (in theory they could be multiplexed in the future and not be a 1:1 mapping) and to that end I think the PluggableComponent CRD makes that differentiation a little more clear (also doesn't require parsing the same CRDs twice in two different contexts which feels a little confusing).

johnewart avatar Sep 22 '22 18:09 johnewart

I don't have any strong opinion other than making things clear and easy to use -- in that regard I think that using only the Component CRD with some metadata doesn't quite communicate that fact that a pluggable component is adding itself to the registry (in theory they could be multiplexed in the future and not be a 1:1 mapping) and to that end I think the PluggableComponent CRD makes that differentiation a little more clear (also doesn't require parsing the same CRDs twice in two different contexts which feels a little confusing).

By registry you mean the Dapr component registry in daprd, correct? If Dapr scans the component and adds it to the corresponding registry based on the pluggable explicit definition, how is that less clear than another CRD?

Also I don't understand the comment about why the same CRD would be parsed twice? What contexts are you referring to?

yaron2 avatar Sep 22 '22 18:09 yaron2

I didn't get the registry part,

I may have misunderstood, but the registry currently is used for register and create new instances of named components, but in your proposal the component is inline, correct?

See this example, could you clarify/exemplify how it should look like for registering the pluggable component you mentioned?


pubsubLoader.DefaultRegistry.RegisterComponent(pubsub.NewPluggable, "customStateStore") // the metadata name?

So that I can reuse it as pubsub.customStateStore? if yes, I agree with John seems that we're mixing "instantiate" a component with "defining" a new component

And there's another problem with it: the order matters.

All components that are pluggable should be processed first to make sure it will only be referenced later

mcandeia avatar Sep 22 '22 19:09 mcandeia

Since users don't know about or interact with our registry in any way since this is an implementation details at the lowest level, we should keep this out of scope for the CRD user experience discussion.

In that context, I believe reusing the Component CRD and adding a clear and explicit pluggable section enables greater simplicity for users, keeps both personas (developer and operator) aligned as explained above, and lowers maintenance for Dapr maintainers and the community.

yaron2 avatar Sep 23 '22 15:09 yaron2

@yaron2 Let's say we decided to take the approach you mentioned, for bindings specifically, how would you suggest to discover if the component implements input or output or both? With the Pluggable Components CRD approach it has different names,

type: inputbinding and type: outputbinding,

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: customBinding
spec:
  type: bindings
  version: v1
  metadata:
  - name: table
    value: "mytable"
  - name: customProperty
    value: "something"
  pluggable:
    enabled: true

Create a new field that serves only for bindings?

mcandeia avatar Sep 26 '22 17:09 mcandeia

Can we make pluggable components consistent with how we discover the binding direction today?

yaron2 avatar Sep 26 '22 17:09 yaron2

Currently, we are using Has methods from Bindings registry. That Has methods returns true when a component with the specified name + version is available to be created. For inlining

  1. Or we have to "Register" the pluggable component (which does not makes sense to me due to this)
  2. Or we don't use registries at all and the Has won't work

So I think we need a different approach for that

mcandeia avatar Sep 26 '22 18:09 mcandeia

For documentation purposes:

As we aligned offline, for the 1.9 release:

  1. Pluggable Components CRD will be removed.
  2. The bindings support will require implementing both, input and output.

cc: @yaron2 @artursouza

mcandeia avatar Sep 26 '22 19:09 mcandeia

I'm on it right now, and as soon as I find a blocker I'll need your help to align it quickly so we don't get stuck as we have a tightly deadline

Related to our "check" if socket exists or not on registry

As we aligned here the socket name is based on the ComponentCRD metadata.Name, since our registries does not have access to this information (currently, it is passed on Init), we have few options:

  1. Assume that the used CRD type will be unique - each component type has its own socket
  2. Pass this info to the registries (I would avoid that, to lead in having different registries APIs - would become difficult to create a single abstraction)
  3. do not check for the socket file on the registry

To exemplify:

currently:

apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
  name: dotnetredis
spec:
  type: state
  version: v1
---
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: userstore
spec:
   type: state.dotnetredis
   version: v1

socket: /var/run/dapr-state.dotnetredis-v1-userstore.sock

with the proposed changes

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: userstore
spec:
   type: state.dotnetredis
   version: v1

socket: /var/run/dapr-state.dotnetredis-v1.sock

cc: @yaron2 @artursouza

mcandeia avatar Sep 26 '22 19:09 mcandeia

As aligned offline with @yaron2 and @artursouza, if we have enough time to do, l'll implement the following changes:

Pluggable Components AutoRegistration

Use a common and shared folder to create sockets with arbitrary names. Dapr Runtime will discover components by looking into a specific folder for sockets available to connect. The socket name (without any extension) will be the component name that should be used in the component CRD.

gRPC Reflection API

Components are required to implement gRPC reflection API. This is used to identify which building blocks that socket implement and further used by registering in the respective registry.

mcandeia avatar Sep 26 '22 21:09 mcandeia

In addition to that: I'm going to remove the pluggable components feature flag as we are relying on socket creation to work. So I think it should be sufficient.

mcandeia avatar Sep 27 '22 00:09 mcandeia

In addition to that: I'm going to remove the pluggable components feature flag as we are relying on socket creation to work. So I think it should be sufficient.

I agree.

yaron2 avatar Sep 27 '22 00:09 yaron2