microshift icon indicating copy to clipboard operation
microshift copied to clipboard

[RFE] Multi-node Request for Enhacement

Open oglok opened this issue 3 years ago • 12 comments

This commit only describes the addition of new compute nodes to an existing MicroShift cluster. Highly available control plane will be described in later PRs.

Signed-off-by: Ricardo Noriega [email protected]

This Enhacement proposal addresses part of the #460 epic.

oglok avatar Dec 13 '21 10:12 oglok

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please ask for approval from oglok after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Dec 13 '21 10:12 openshift-ci[bot]

There's a lot going on here. Something I am struggling to wrap my head around is that to me, a key point of "OpenShift 4" is that the cluster manages the OS. That isn't handled by MicroShift today (right?).

Now one thing I was asked to comment on here is the relationship to OCP node join. What I would say is basically all of that logic lives in https://github.com/openshift/cluster-machine-approver

OK I just did https://github.com/openshift/cluster-machine-approver/pull/150 - I hope that's helpful.

cgwalters avatar Dec 17 '21 16:12 cgwalters

@cgwalters @stlaz thanks for looking at this, I'm having a look to the cluster-machine-approver

mangelajo avatar Jan 14 '22 09:01 mangelajo

and yes, @cgwalters MicroShift doesn't manage the OS today, not sure if we would need to look at that at some point.

mangelajo avatar Jan 14 '22 10:01 mangelajo

@fzdarsky

mangelajo avatar Jan 14 '22 10:01 mangelajo

hmm @cgwalters so, it looks like we were proposing the same mechanism used in OpenShift, but in openshift we have the cluster-machine-approver which makes additional checks based on the open shift machine API (which we wouldn't have on MicroShift), and other Node details.

Probably it makes sense to use the simpler kube-controller-manager to start with, and then in a future it could make sense to have something to extend the CSR made by kubelet with TPM details (for example), and then have an specific MicroShift approver that also can check the CSR based on a CA or again via the TPM hardware module on the masters.

mangelajo avatar Jan 14 '22 10:01 mangelajo

Assuming the bootstrap token is transferred and maintained securely, then I don't think any additional checks against a CA add much value. TPMs on the other hand can be quite powerful, but require investment and aren't available everywhere.

One problem in OCP today is we transfer that token via served Ignition - xref https://github.com/openshift/machine-config-operator/pull/736 (Long story short, I think we want to move all secrets into the bootstrap ignition which is part of the cloud metadata)

cgwalters avatar Jan 14 '22 22:01 cgwalters

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar May 18 '22 00:05 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Jun 17 '22 00:06 openshift-bot

@oglok: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Jun 17 '22 00:06 openshift-ci[bot]

I think we should close this, since we don't intend to support multi-node.

dhellmann avatar Jun 28 '22 12:06 dhellmann

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot avatar Jul 28 '22 13:07 openshift-bot

/kind design /close

We've decided not to support multi-node.

dhellmann avatar Aug 21 '22 17:08 dhellmann

Hi team, with full respect, I think that this decision is far from being strategic. As a Red Hat solution architect I work with customers who are just comparing MicroShift with other already existing similar solutions. Some of them give the chance of setting up a multinode cluster, like for example k3s.

Giving the option of setting up a minimal multi-node cluster (for example adding a sedond HA master or multiple workers) can be a true game changer, even for edge scenarios where application high-availability cannot be managed using existing tools and application HA features to support failover.

For this reason I suggest to reconsider this RFE.

giannisalinetti avatar Sep 02 '22 16:09 giannisalinetti

@giannisalinetti can you provide more detail on which scenarios cannot be managed using existing tools and application HA features to support failover? if you desire two control hosts to provide HA, i believe you will have less overall reliability than two single node solutions. if you scale out to additional workers/control hosts, it also demands workflow during upgrade scenarios, and increases significantly the resource consumption that begins to look like traditional standalone openshift. at the moment, we are focusing on ensuring we can meet minimal resource budget for single node scenarios.

Any additional detail you can provide for where microshift would be a fit, but standalone openshift would not, its always appreciated. If its entirely around resource consumption, keep in mind that as you grow clusters, consumption increases at all levels (control hosts, and networking sdn).

derekwaynecarr avatar Sep 06 '22 13:09 derekwaynecarr

@derekwaynecarr I do not have a specific scenario to share now but I can assure you that at least one strategic customer of mine (I can't write the name for privacy reasons but they are one of the main customers in Italy and they are looking forward to the productized version) is comparing MicroShift evolution with other similar products and has already asked me why this feature cannot be included as well as other competitor alternatives like k3s.

giannisalinetti avatar Sep 06 '22 14:09 giannisalinetti

@giannisalinetti I would be interested in the customer's feedback to @derekwaynecarr's answer. Maybe we can do that privately (email, chat, etc.) so we can have all of the details.

dhellmann avatar Sep 06 '22 14:09 dhellmann

@dhellmann @derekwaynecarr sure! We can discuss it by email if you agree: [email protected]

giannisalinetti avatar Sep 06 '22 16:09 giannisalinetti