dapr
dapr copied to clipboard
proposal: Pluggable Components Registry
In what area(s)?
/area runtime
/area operator
/area placement
/area docs
/area test-and-release
Describe the proposal
Context
This proposal is focused on improving the relationship between pluggable components CRD and component the CRD.
The naming problem
Be backed by kubernetes is almost the same as saying that kubernetes is also a kind of component registry, also means that all available Pluggable Components CRD is part of a unique registry for dapr point of view.
Currently, we are using the metadata.name as part of the component name when used by dapr enduser. So let's take an example,
For the given pluggable component
apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
name: kafka-pluggable
spec:
type: outputbinding
version: v1
this will be translated to: bindings.kafka-pluggable version v1 with only outputbindings implemented (yes you can have only input/output separately)
referencing in a component:
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: pluggable-test-topic
spec:
type: bindings.kafka-pluggable
Note that the metadata.name is used as the component name, and the remaining fields are used for defining its type and its version, so the final component name will be ${spec.type}.${metadata.name}-${spec.version}
But now let's say we want to realase the major 2 of kafka bindings. cool, let's create a new pluggable component then
apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
name: kafka-pluggable
spec:
type: outputbinding
version: v2
Did you notice the problem? Yes, the object name is the same, hence the kubernetes will treat as the same object and not allowing two majors running at same time (which is totally fine and desired when doing migrations)
The second problem is for the same reason but now, let's say we want to implement the inputbinding for the kafka pluggable component, how it should look like?
apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
name: kafka-pluggable
spec:
type: inputbinding
version: v1
Yes, same problem as before, so it will endup of having few choices:
Name is everything approach
For name is everything approach we have few advantages from leveraging kubernetes features to keep uniqueness across objects.
So instead of
apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
name: kafka-pluggable
spec:
type: inputbinding
version: v1
we would have something like
apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
name: inputbinding.kafka-pluggable-v1
Disadvantages:
- object name restrictions (max chars 45, lot of characters are not allowed, etc)
- We can't validate the format when applying objects
Advantage:
- Leverages kubernetes name uniqueness property
Name is nothing approach (preferred)
In the name is nothing approach we don't use the metadata.name for nothing, instead we should have a name in the crd spec.
apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
name: does-not-matter
spec:
name: kafka-pluggable
type: outputbinding
version: v1
Disadvantage:
We can have duplicated components with different names, e.g:
apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
name: does-not-matter-2
spec:
name: kafka-pluggable
type: outputbinding
version: v1
For that, a simple solution would be getting first by lexicographical order for the metadata name or last-applied wins + warning log.
Other alternatives:
- Runtime error
- Validation hook
We also have problems associated with versioning e.g Proto versioning and component versioning should be made separately?
But it deserves its own issue.
cc: @johnewart @artursouza @yaron2 @tmacam
The second approach is much preferable to the first.
Reference fields in CRDs referencing other Kubernetes objects are a common design and I am not at all worried about the fact there can be 2 pluggable components CRDs referencing the same component. A runtime error would be the correct way to handle this scenario..
Separate question: why do we need the type field and not infer that from the referenced Component?
Excellent question,
We can, but it requires changing our component registry behavior or a hint in the component crd telling the runtime if such component is pluggable(I’m not a fan of this one)
This is used to select which registry such pluggable component must be added. Loading pluggable components is the same as calling
${specType}Registry.RegisterComponent(“kafka-pluggable”, newKafkaPluggable)
in other words the spec type is used to ensure that the component will be already registered before the loading components phase
A runtime error would be the correct way to handle this scenario
Runtime error in this case means shutting down the sidecar?
Keep in mind that it could be out of control of dapr enduser. Considering the following scenario:
Two different helm charts being developed by two different vendors (pluggable component implementers) with same name but containing different set of components with only one intersection:
Let's say two helm charts A and B:
Helm chart A installs pubsub.redis and pubsub.kafka Helm chart B installs state.redis and pubsub.kafka
So the enduser can't use them together because of the naming conflict?
That's another solution for it,
having the ComponentRegistry CRD and removing the PluggableComponentCRD,
Terminology
- Vendors: Component Implementers
- A vendor has an uniqueID let's say
aws
- A vendor has an uniqueID let's say
- Registry: A list of curated components maintained by a vendor
Registry
Vendors releases components via ComponentRegistry CRD
apiVersion: dapr.io/v1alpha1
kind: ComponentRegistry
metadata:
name: aws
spec:
state:
dynamo@v1:
bindings:
dynamo@v1:
pubsub:
sqs@v2:
sqs@v1:
component name would be ${type}.${vendor}.${name}
${vendor} being unique across registries e.g aws, azure
i.e pubsub.aws.sqs
for me as a developer it should look like this:
apiVersion: dapr.io/v1alpha1
kind: ComponentRegistry
metadata:
name: mcandeia
spec:
state:
dynamo@v1:
bindings:
dynamo@v1:
pubsub:
sqs@v2:
sqs@v1:
using: pubsub.mcandeia.dynamo
What does that solves?
- This is semantically closer to what we have in the code
- This allows us to leverage the uniqueness of the metadata name because theoretically each vendor will have its unique ID and this allows having the components namespaced by vendors (it is clearer for the user)
Runtime error in this case means shutting down the sidecar?
Yes. It's a valid reason to panic.
So the enduser can't use them together because of the naming conflict?
Correct.
For improved simplicity, user experience and Dapr code maintenance, I propose we remove PluggableComponent and not use any new CRD for pluggable components. Rather, we reuse the existing Component CRD and add pluggable features to it, so a component can become pluggable. This has the benefit of:
- Best UX for developers and operators - no new CRDs needed
- Decreased maintenance for the Dapr codebase
Draft CRD
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: customStateStore
spec:
type: pubsub.yaron
version: v1
metadata:
- name: table
value: "mytable"
- name: customProperty
value: "something"
pluggable:
enabled: true
In this case, we reuse all the existing types in the type field without providing a concrete implementation since this is a pluggable component which is not registered in the runtime.
This reuses the existing CRD and thus does not run into naming issues. In addition, any future features and corresponding fields for pluggable components are added to the pluggable section.
WDYT?
I like the simplicity but it seems that it blocks a lot of things in the way, for example, if we add more attributes in the pluggable component this needs to be repeated in each CRD component that points to the same pluggable component.
For now, it looks like it's okay because we don't have any attributes other than the pluggable component name, but if you add at least one, you'll have to start repeating in each CRD.
Also, we need to agree first that they are different personas when dealing with pluggable components/components, and even if it's the same person sometimes, they will have different concerns,
Does it makes sense to you ?
Does it makes sense to you ?
Not very much, no :)
if we add more attributes in the pluggable component this needs to be repeated in each CRD component that points to the same pluggable component.
The proposed design above maintains there is no "pointing" between a component CRD to a pluggable component. There is a single Component definition and that can be defined as pluggable or not.
This should also address your next concerns about personas and adding fields.
Your proposal is fine considering the requirements now. But it is just a way to add an "inline" pluggable component, so we can move all fields from pluggable component crd to the component crd as pluggable property,
What I'm trying to consider is a world where pluggable components definitions are reusable across components, like we started to discuss in this conversation, are you considering that as a possibility?
I would say if we go for "inlining" the pluggable component we could use something like:
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: customStateStore
spec:
type: pubsub.grpc
version: v1
metadata:
- name: table
value: "mytable"
- name: customProperty
value: "something"
no changes required
What I'm trying to consider is a world where pluggable components definitions are reusable across components
I would say this is out of scope, due to:
- It isn't common to share definitions between CRDs or even Kubernetes objects. This is largely outside the scope of Dapr
- Components today have many potential sharable definitions (credentials as an example that stands out the most). We have seen no asks to make this sharable across components. And these are definitions that every developer and/or operator interface with the most. Resiliency CRDs are another very good example of many fields that could become sharable, but the commonality of the "per resource instance, encapsulated configurations" proved itself in this design.
- The number of fields pluggable components will have will likely be much smaller, orders of magnitude even, then components and resiliency CRDs today
- If it does happen in the future that pluggable components will have dozens of properties AND most of these are sharable, then we can safely introduce a pluggable component configuration CRD that is referenced, or even put this into the
ConfigurationCRD. But again, this is extremely unlikely.
Re: the CRD proposal you made, I think this is less explicit for users and less clear that they are looking and/or declaring a pluggable component.
pluggable.enabled makes the intention clear, it is typed and can be validated via tooling. In addition, it makes room to add any fields that are reserved for pluggable definitions in the future.
Oh wait, just to make something clear, when I say "pluggable components definitions are reusable across components" it literally means "Reference fields in CRDs referencing other Kubernetes objects are a common design"
I mean, declaring it once and reuse by referencing the object name
Oh wait, just to make something clear, when I say "pluggable components definitions are reusable across components" it literally means this "Reference fields in CRDs referencing other Kubernetes objects are a common design"
So those are actually two different things: sharable definitions vs reference fields. As an example of a common use case for reference fields, look at Kubernetes selectors as an example - where a Kubernetes service is a totally separate resource but it references another resource which is a Deployment.
In our case, you are talking about sharing common fields for the purpose of sharing them between the same object type (Component), which is what I referred to above.
Looking at our use case here, there is no reason to have a separate CRD really as pluggable components are still components, logically, and thus it makes sense to be able to declare and define pluggable features in the component.
Please be clear if you are against this proposal as this brings more clarity to me, because I am struggling on considering this possibility and it leads me to different alternatives. If we do not consider this possibility, I am 100% in favor of your proposal. But if we consider I would like to see a world where I can say that the
pubsub.redis-pluggable-v1 points to the container: ghcr.io/redis/[email protected]
and pubsub.redis-pluggable-v2 points to the container ghcr.io/redis/[email protected]
which contains other properties like env vars, volumes, resource limits, that, from my perspective, should not be a user concern, hence it is redis specific. So the component CRD should not contain those specifics fields.
It's totally fine to be against it, but I need to know so that design decisions are made in a way that either allow that future or disregards that future, or at least, exercises how the now connects with that future
The only thing I'm favoring less is the ComponentRegistry design.
I can get behind a PluggableComponent CRD with runtime validation of duplicate names.
However, I'm proposing an alternative design that, in my opinion, offers a better UX for developers, operators and less maintenance burden for Dapr maintainers.
which contains other properties like env vars, volumes, resource limits, that, from my perspective, should not be a user concern, hence it is redis specific. So the component CRD should not contain those specifics fields.
I would argue that Component and PluggableComponent are targeting the same personas, and having the mentioned fields in the Component CRD makes sense when considering the PoV of both personas.
Developer
In a dev test environment, developers need full autonomy to define tthe container image, volumes etc. Having this in the Component serves that purpose and enables a dev/test scenario with minimum additional overhead.
In a production environment, developers will not be authorized access to components, and thus will not be able to define said fields
Operator
As an operator defining a component in any environment, specifically production, they will have access to define the component including the packaged code (container image, volumes etc.)
What do you think? cc @artursouza
Now we are talking,
I’ll wait for artur opinion to have one more perspective here.
After a good night of sleep I think we can get back to this discussion,
So I would like to split the discussion in two, 1) what we are going to do targeting 1.9, and 2) what is the state of art for pluggable components - desired future
for 1)
due to the tightly deadline, I would argue that should keep the Pluggable Component CRD and add name as a spec attr rather than using from metadata.name, are you ok with it @yaron2 ?
I don't have any strong opinion other than making things clear and easy to use -- in that regard I think that using only the Component CRD with some metadata doesn't quite communicate that fact that a pluggable component is adding itself to the registry (in theory they could be multiplexed in the future and not be a 1:1 mapping) and to that end I think the PluggableComponent CRD makes that differentiation a little more clear (also doesn't require parsing the same CRDs twice in two different contexts which feels a little confusing).
I don't have any strong opinion other than making things clear and easy to use -- in that regard I think that using only the
ComponentCRD with some metadata doesn't quite communicate that fact that a pluggable component is adding itself to the registry (in theory they could be multiplexed in the future and not be a 1:1 mapping) and to that end I think thePluggableComponentCRD makes that differentiation a little more clear (also doesn't require parsing the same CRDs twice in two different contexts which feels a little confusing).
By registry you mean the Dapr component registry in daprd, correct? If Dapr scans the component and adds it to the corresponding registry based on the pluggable explicit definition, how is that less clear than another CRD?
Also I don't understand the comment about why the same CRD would be parsed twice? What contexts are you referring to?
I didn't get the registry part,
I may have misunderstood, but the registry currently is used for register and create new instances of named components, but in your proposal the component is inline, correct?
See this example, could you clarify/exemplify how it should look like for registering the pluggable component you mentioned?
pubsubLoader.DefaultRegistry.RegisterComponent(pubsub.NewPluggable, "customStateStore") // the metadata name?
So that I can reuse it as pubsub.customStateStore? if yes, I agree with John seems that we're mixing "instantiate" a component with "defining" a new component
And there's another problem with it: the order matters.
All components that are pluggable should be processed first to make sure it will only be referenced later
Since users don't know about or interact with our registry in any way since this is an implementation details at the lowest level, we should keep this out of scope for the CRD user experience discussion.
In that context, I believe reusing the Component CRD and adding a clear and explicit pluggable section enables greater simplicity for users, keeps both personas (developer and operator) aligned as explained above, and lowers maintenance for Dapr maintainers and the community.
@yaron2 Let's say we decided to take the approach you mentioned, for bindings specifically, how would you suggest to discover if the component implements input or output or both? With the Pluggable Components CRD approach it has different names,
type: inputbinding and type: outputbinding,
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: customBinding
spec:
type: bindings
version: v1
metadata:
- name: table
value: "mytable"
- name: customProperty
value: "something"
pluggable:
enabled: true
Create a new field that serves only for bindings?
Can we make pluggable components consistent with how we discover the binding direction today?
Currently, we are using Has methods from Bindings registry. That Has methods returns true when a component with the specified name + version is available to be created. For inlining
- Or we have to "Register" the pluggable component (which does not makes sense to me due to this)
- Or we don't use registries at all and the
Haswon't work
So I think we need a different approach for that
For documentation purposes:
As we aligned offline, for the 1.9 release:
- Pluggable Components CRD will be removed.
- The bindings support will require implementing both, input and output.
cc: @yaron2 @artursouza
I'm on it right now, and as soon as I find a blocker I'll need your help to align it quickly so we don't get stuck as we have a tightly deadline
Related to our "check" if socket exists or not on registry
As we aligned here the socket name is based on the ComponentCRD metadata.Name, since our registries does not have access to this information (currently, it is passed on Init), we have few options:
- Assume that the used CRD
typewill be unique - each componenttypehas its own socket - Pass this info to the registries (I would avoid that, to lead in having different registries APIs - would become difficult to create a single abstraction)
- do not check for the socket file on the registry
To exemplify:
currently:
apiVersion: dapr.io/v1alpha1
kind: PluggableComponent
metadata:
name: dotnetredis
spec:
type: state
version: v1
---
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: userstore
spec:
type: state.dotnetredis
version: v1
socket: /var/run/dapr-state.dotnetredis-v1-userstore.sock
with the proposed changes
apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: userstore
spec:
type: state.dotnetredis
version: v1
socket: /var/run/dapr-state.dotnetredis-v1.sock
cc: @yaron2 @artursouza
As aligned offline with @yaron2 and @artursouza, if we have enough time to do, l'll implement the following changes:
Pluggable Components AutoRegistration
Use a common and shared folder to create sockets with arbitrary names. Dapr Runtime will discover components by looking into a specific folder for sockets available to connect. The socket name (without any extension) will be the component name that should be used in the component CRD.
gRPC Reflection API
Components are required to implement gRPC reflection API. This is used to identify which building blocks that socket implement and further used by registering in the respective registry.
In addition to that: I'm going to remove the pluggable components feature flag as we are relying on socket creation to work. So I think it should be sufficient.
In addition to that: I'm going to remove the pluggable components feature flag as we are relying on socket creation to work. So I think it should be sufficient.
I agree.