k0sctl CoreOS support

Hey folks, I was wondering if you plan to support coreos (Fedora & Rhel). Any idea what would it take to add this?

As of now coreos is discovered as Fedora, k0sctl then tries to install stuff using yum which is not present in coreos, so it fails.

Thanks

Mar 20 '21 23:03 kaplan-michael

So I have done some more digging and it seems, there should be distinction in OS detection. That means if the OS is detected as coreos, it shouldn't try to install kubectl using yum, but fall back to curl method. (alternatively use rpm-ostree, but the node needs to be rebooted for the changes to take effect, so some some kind of reboot & wait until the node is back up function would be needed.)

Mar 21 '21 16:03 kaplan-michael

@kaplan-michael to add support for CoreOS, maybe have a look at how Alpine support is implemented: https://github.com/k0sproject/k0sctl/blob/main/configurer/linux/alpine.go

Alpine does have package mgmt, thus for CoreOS you'd need some way to pick up the package from

func (l Alpine) InstallPackage(h os.Host, pkg ...string) error {

and use proper download url etc.

Mar 24 '21 09:03 jnummelin

So I had a look at it, for it to be properly implemented, rig library would have to provide more checks. As core os has the ID of the system it is based on.(ID=Fedora). They are distinguished by VARIANT_ID or VARIANT. so rig would ideally have to parse those too. @jnummelin do you have any guidance about it? Thanks in advance.

Mar 25 '21 22:03 kaplan-michael

@kaplan-michael yeah, sounds like we need more info to rig OSVersion struct: https://github.com/k0sproject/rig/blob/main/osversion.go

The parsing of it happens at https://github.com/k0sproject/rig/blob/main/resolver.go#L110, should not be too big effort to make it understand more fields.

I do not have a CoreOS box at my hands, could you share an example of the /etc/os-release file it has?

Mar 26 '21 13:03 jnummelin

I'll have a look at it over the weekend.

Here you have the example of the one I have on hand.

NAME=Fedora
VERSION="33.20210301.3.1 (CoreOS)"
ID=fedora
VERSION_ID=33
VERSION_CODENAME=""
PLATFORM_ID="platform:f33"
PRETTY_NAME="Fedora CoreOS 33.20210301.3.1"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:33"
HOME_URL="https://getfedora.org/coreos/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/"
SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=33
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=33
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='33.20210301.3.1'

Mar 26 '21 14:03 kaplan-michael

Rig already parses the PRETTY-NAME field so a specific OS implementation could use some 'ID equals 'Fedora' and name contains 'CoreOS' condition check for detection without needing additions to rig. Not that I'm against enhancing it. :)

Mar 26 '21 20:03 jnummelin

That sounds good, although I think the variant fields is more reliable source. It would be great to enhance rig in the future.

Mar 26 '21 21:03 kaplan-michael

OK, I can reliably install package using curl. I would like to implement it using rpm-ostree, as it will keep the packages updated, but the issue I'm running into there is the need of rebooting. That will break the connection. I'm not sure if there could be easily implemented. I'm wasn't able to find if rig can reconnect after the connection was lost? Similarly as Ansible does with reboot. Where it waits until the host is back up and reconnects. If not, I'll send a PR with curl only for now.

Mar 26 '21 21:03 kaplan-michael

So here is the commit be8f6cf, I would appreciate if you could have a look at it, I'll test it more before I send the PR.

BTW: I still have issues with etcd user being ignored in my configs. Intention is to run etcd under "k0s-etcd" instead of "etcd" I can share the config if it helps, but it mostly taken just from examples.

Mar 26 '21 22:03 kaplan-michael

@kaplan-michael in general the work in linked commit looks pretty good.

I'm wasn't able to find if rig can reconnect after the connection was lost?

IIRC rig does not do any "auto-reconnect" but not 100% sure. Maybe we can tackle that at k0sctl level, so once we know the reboot is happening we can tell rig to reconnect. Something like (pseudo-ish code):

func (l Coreos) InstallPackage(h os.Host, pkg ...string) error {
       err := h.Execf("sudo rpm-ostree install %s --reboot", strings.Join(pkg, " "))
       if err != nil { return err }
	return h.Connect()
}

@kke might have some better ideas for the reconnect part :)

Mar 31 '21 13:03 jnummelin

This is from another project using rig:

// Reconnect disconnects and reconnects the host's connection
func (h *Host) Reconnect() error {
	h.Disconnect()

	log.Infof("%s: waiting for reconnection", h)
	return retry.Do(
		func() error {
			return h.Connect()
		},
		retry.DelayType(retry.CombineDelay(retry.FixedDelay, retry.RandomDelay)),
		retry.MaxJitter(time.Second*2),
		retry.Delay(time.Second*3),
		retry.Attempts(60),
	)
}

// Reboot reboots the host and waits for it to become responsive
func (h *Host) Reboot() error {
	log.Infof("%s: rebooting", h)
	if err := h.Configurer.Reboot(h); err != nil {
		return err
	}
	log.Infof("%s: waiting for host to go offline", h)
	if err := h.waitForHost(false); err != nil {
		return err
	}
	h.Disconnect()

	log.Infof("%s: waiting for reconnection", h)
	if err := h.Reconnect(); err != nil {
		return fmt.Errorf("unable to reconnect after reboot")
	}

	log.Infof("%s: waiting for host to become active", h)
	if err := h.waitForHost(true); err != nil {
		return err
	}

	if err := h.Reconnect(); err != nil {
		return fmt.Errorf("unable to reconnect after reboot: %s", err.Error())
	}

	return nil
}

// when state is true wait for host to become active, when state is false, wait for connection to go down
func (h *Host) waitForHost(state bool) error {
	err := retry.Do(
		func() error {
			err := h.Exec("echo")
			if !state && err == nil {
				return fmt.Errorf("still online")
			} else if state && err != nil {
				return fmt.Errorf("still offline")
			}
			return nil
		},
		retry.DelayType(retry.CombineDelay(retry.FixedDelay, retry.RandomDelay)),
		retry.MaxJitter(time.Second*2),
		retry.Delay(time.Second*3),
		retry.Attempts(60),
	)
	if err != nil {
		return fmt.Errorf("failed to wait for host to go offline")
	}
	return nil
}

Apr 07 '21 07:04 kke

So just to update, I still didn't managed to find the time to do it.

Apr 28 '21 21:04 kaplan-michael

Ok, so to update, kubectl is in a separate rpm repo. and the reboot and reconnect (similar as @kke mentioned) would require some type trickery for which I don't have the time. I'll send a PR only for the curl variant.

BTW: you would probably be better of, building a custom image of coreos when installing.

Oct 27 '21 13:10 kaplan-michael

Kubectl is already embedded in k0s nowadays, so there's no need to install it anymore. It is used by k0sctl like k0s kubectl get nodes.

Only the smoke-tests use a real kubectl but that is only used on the host running the tests.

Oct 29 '21 07:10 kke

ok, I don't have much time to fiddle with it more. the PR should just add the definition for coreos as supported system.

Jan 30 '22 02:01 kaplan-michael

k0sctl k0sctl copied to clipboard

CoreOS support

k0sctl
k0sctl copied to clipboard