microshift
microshift copied to clipboard
[RFE] Multi-node Request for Enhacement
This commit only describes the addition of new compute nodes to an existing MicroShift cluster. Highly available control plane will be described in later PRs.
Signed-off-by: Ricardo Noriega [email protected]
This Enhacement proposal addresses part of the #460 epic.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: To complete the pull request process, please ask for approval from oglok after the PR has been reviewed.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
There's a lot going on here. Something I am struggling to wrap my head around is that to me, a key point of "OpenShift 4" is that the cluster manages the OS. That isn't handled by MicroShift today (right?).
Now one thing I was asked to comment on here is the relationship to OCP node join. What I would say is basically all of that logic lives in https://github.com/openshift/cluster-machine-approver
OK I just did https://github.com/openshift/cluster-machine-approver/pull/150 - I hope that's helpful.
@cgwalters @stlaz thanks for looking at this, I'm having a look to the cluster-machine-approver
and yes, @cgwalters MicroShift doesn't manage the OS today, not sure if we would need to look at that at some point.
@fzdarsky
hmm @cgwalters so, it looks like we were proposing the same mechanism used in OpenShift, but in openshift we have the cluster-machine-approver which makes additional checks based on the open shift machine API (which we wouldn't have on MicroShift), and other Node details.
Probably it makes sense to use the simpler kube-controller-manager to start with, and then in a future it could make sense to have something to extend the CSR made by kubelet with TPM details (for example), and then have an specific MicroShift approver that also can check the CSR based on a CA or again via the TPM hardware module on the masters.
Assuming the bootstrap token is transferred and maintained securely, then I don't think any additional checks against a CA add much value. TPMs on the other hand can be quite powerful, but require investment and aren't available everywhere.
One problem in OCP today is we transfer that token via served Ignition - xref https://github.com/openshift/machine-config-operator/pull/736 (Long story short, I think we want to move all secrets into the bootstrap ignition which is part of the cloud metadata)
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten /remove-lifecycle stale
@oglok: PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I think we should close this, since we don't intend to support multi-node.
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.
/close
/kind design /close
We've decided not to support multi-node.
Hi team, with full respect, I think that this decision is far from being strategic. As a Red Hat solution architect I work with customers who are just comparing MicroShift with other already existing similar solutions. Some of them give the chance of setting up a multinode cluster, like for example k3s.
Giving the option of setting up a minimal multi-node cluster (for example adding a sedond HA master or multiple workers) can be a true game changer, even for edge scenarios where application high-availability cannot be managed using existing tools and application HA features to support failover.
For this reason I suggest to reconsider this RFE.
@giannisalinetti can you provide more detail on which scenarios cannot be managed using existing tools and application HA features to support failover? if you desire two control hosts to provide HA, i believe you will have less overall reliability than two single node solutions. if you scale out to additional workers/control hosts, it also demands workflow during upgrade scenarios, and increases significantly the resource consumption that begins to look like traditional standalone openshift. at the moment, we are focusing on ensuring we can meet minimal resource budget for single node scenarios.
Any additional detail you can provide for where microshift would be a fit, but standalone openshift would not, its always appreciated. If its entirely around resource consumption, keep in mind that as you grow clusters, consumption increases at all levels (control hosts, and networking sdn).
@derekwaynecarr I do not have a specific scenario to share now but I can assure you that at least one strategic customer of mine (I can't write the name for privacy reasons but they are one of the main customers in Italy and they are looking forward to the productized version) is comparing MicroShift evolution with other similar products and has already asked me why this feature cannot be included as well as other competitor alternatives like k3s.
@giannisalinetti I would be interested in the customer's feedback to @derekwaynecarr's answer. Maybe we can do that privately (email, chat, etc.) so we can have all of the details.
@dhellmann @derekwaynecarr sure! We can discuss it by email if you agree: [email protected]