osm
osm copied to clipboard
Canonicalize on 1 or multiple meshes per cluster
Right now the code is sort of in this mid-way state between allowing multiple meshes in a cluster, and having one mesh per cluster.
Do we have a stance on what we want to do going forward? For instance some feature requests may be simpler to implement based on the single mesh assumption.
FWIW somebody could more or less achieve the benefits of multiple meshes through certificate manipulation if we were to add that feature, but with retaining the concept of a single mesh.
That's a good question, and agreed that we're somewhere in between implementations for single vs multiple mesh. I think the driving question that we need to answer here (which I don't have the answer to right now) is if we want OSM to be focused on single or multi-tenant usages for Kubernetes. The argument to supporting multiple meshes in a single cluster would be if there were multiple teams (or even just different software) that shouldn't share the same service mesh.
I'm not sure how valid that requirement is, but I would be curious to hear others' opinions on this.
In a practical sense, multi-mesh is essentially just multi-trust-domain right? I have a hard time envisioning an organization running multiple trust domains in a single cluster. An empty TrafficTarget already denies traffic, right? Is there a scenario where that's not sufficient?
Here's some context from the time we introduced this feature:
The initial idea to support multiple meshes was to address multi-tenancy within the same cluster and provide isolation primitives in terms of policies, for both the user and control plane. This was done using the openservicemesh.io/monitored-by: <mesh-name>
label. An immediate use case back then was to allow multiple CI runs from different pull requests running on the same underlying k8s cluster to not interfere with each other. This worked well for a long time. Then came the need to support upgrades, which involved global resources such as CRDs, webhook configs, etc., and designing this for multiple meshes was not an immediate priority for the project.
So the idea behind multiple meshes was to provide logical isolation for the control plane and user policies applicable to a mesh instance. I am unsure if customers have such a requirement, but our use case back then was to allow running multiple OSM instances on the same cluster. If we do not see a need for such a scenario anymore, supporting a single mesh will simplify some of the existing components.
Multiple CI runs is an interesting case, although I'd imagine that applies to us more so than to customers? If so, it seems like what we have with KIND is a working solution.
My vote would be to remove the ability to have multiple meshes. If we ever do find a strong customer need, we can re-introduce the feature by allowing multiple cert chains (1 per tenant), and mapping those to each logical mesh, which would solve the multiple global resource issue.
In the end I'm not too opinionated with whatever we choose here, but do see value in coming to a decision
Multiple CI runs is an interesting case, although I'd imagine that applies to us more so than to customers? If so, it seems like what we have with KIND is a working solution.
My vote would be to remove the ability to have multiple meshes. If we ever do find a strong customer need, we can re-introduce the feature by allowing multiple cert chains (1 per tenant), and mapping those to each logical mesh, which would solve the multiple global resource issue.
In the end I'm not too opinionated with whatever we choose here, but do see value in coming to a decision
I can imagine a similar scenario for customers, though it is uncommon in practice.
Multiple meshes within a cluster are not just about certificates. There is a lot more to it. The multi-mesh feature allows multiple control plane instances to co-exist and manage a logical mesh instance without interfering with each other, ie. a policy applied in 1 mesh will have no bearing on other meshes.
Currently, the multi-mesh feature only lacks upgrade support, as it results in a global state change on the cluster (CRDs, conversion webhooks etc.), and would affect other meshes.
Just to throw another wrinkle into this, multi-mesh is a superset of a separate, more common problem: canary upgrades between control plane mesh versions. During most implementations of this process, two instances of the control plane are running at the same time, but each instance is handling a subset of resources. Before coming to a decision on the larger question of multi-mesh support, it's probably worth shoring up our user stories to better understand what problems our users will be trying to solve
These are all really good points. From my perspective, we have three options here:
- Keep the underlying multi-mesh code as-is and remove the user experience from the CLI so that it isn't confusing.
- Remove all the multi-mesh code from the product.
- Complete the multi-mesh feature so that is usable and reliable.
My personal vote would be for the first option. Without a heavy user demand I'm not sure if we want to commit to this work in the short-term. But likewise, I don't want to rip out all the code and rely on Git history to piece it back together if/when we decide to re-add this feature in the future.
But agreed, @steeling, the current status of this is not ideal and we should do something to put it in a more consistent state that isn't confusing or misleading.
Curious to others' opinions on this and how you all would like to see this transition.
These are all really good points. From my perspective, we have three options here:
- Keep the underlying multi-mesh code as-is and remove the user experience from the CLI so that it isn't confusing.
- Remove all the multi-mesh code from the product.
- Complete the multi-mesh feature so that is usable and reliable.
My personal vote would be for the first option. Without a heavy user demand I'm not sure if we want to commit to this work in the short-term. But likewise, I don't want to rip out all the code and rely on Git history to piece it back together if/when we decide to re-add this feature in the future.
But agreed, @steeling, the current status of this is not ideal and we should do something to put it in a more consistent state that isn't confusing or misleading.
Curious to others' opinions on this and how you all would like to see this transition.
I agree with everything mentioned. I think the one question that remains would be, if adding a new feature, should we maintain the ability to leverage a second mesh? If the answer is no, then over a long enough period of time we would likely end up with a frankenstein feature that would need to be rewritten from scratch.
That said, I think it's a good approach in the interim, and I'd add my vote that we go with option #1 that @trstringer mentioned. Side note, this would have immediate implications for #4613
This issue will be closed due to a long period of inactivity. If you would like this issue to remain open then please comment or update.
Issue closed due to inactivity.
Added default label size/needed
. Please consider re-labeling this issue appropriately.
Hi @shashankram
Hope you are doing well. Do we have any plan support multiple mesh in future? Tks.
This issue will be closed due to a long period of inactivity. If you would like this issue to remain open then please comment or update.
Issue closed due to inactivity.