atomic-system-containers icon indicating copy to clipboard operation
atomic-system-containers copied to clipboard

Issues running kubernetes on Fedora Atomic Host

Open kryachkov opened this issue 8 years ago • 1 comments

All steps are done according to http://www.projectatomic.io/docs/gettingstarted/ and http://www.projectatomic.io/blog/2017/11/migrating-kubernetes-on-fedora-atomic-host-27/

  1. kubernetes-apiserver

Initial values:

[root@atomic-node1 easyrsa3]# atomic host status
State: idle
Deployments:
● fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.25 (2017-12-10 18:40:57)
                    Commit: a2b80278eea897eb1fec7d008b18ef74941ff5a54f86b447a2f4da0451c4291a
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4

  fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.16 (2017-11-28 23:08:35)
                    Commit: 86727cdbc928b7f7dd0e32f62d3b973a8395d61e0ff751cfea7cc0bc5222142f
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4

Generate server certificates:

[root@atomic-node1 ~]# curl -L -O https://storage.googleapis.com/kubernetes-release/easy-rsa/easy-rsa.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 43444  100 43444    0     0  43444      0  0:00:01 --:--:--  0:00:01 86714
[root@atomic-node1 ~]# tar xzf easy-rsa.tar.gz
[root@atomic-node1 ~]# cd easy-rsa-master/easyrsa3
[root@atomic-node1 easyrsa3]# ./easyrsa init-pki

init-pki complete; you may now create a CA or requests.
Your newly created PKI dir is: /root/easy-rsa-master/easyrsa3/pki

[root@atomic-node1 easyrsa3]# MASTER_IP=10.0.1.4
[root@atomic-node1 easyrsa3]#
[root@atomic-node1 easyrsa3]# ./easyrsa --batch "--req-cn=${MASTER_IP}@`date +%s`" build-ca nopass
Generating a 2048 bit RSA private key
...............................................................................+++
..+++
writing new private key to '/root/easy-rsa-master/easyrsa3/pki/private/ca.key'
-----
[root@atomic-node1 easyrsa3]# ./easyrsa --subject-alt-name="IP:${MASTER_IP}" build-server-full server nopass
Generating a 2048 bit RSA private key
...........................+++
................................................................................................................................................+++
writing new private key to '/root/easy-rsa-master/easyrsa3/pki/private/server.key'
-----
Using configuration from /root/easy-rsa-master/easyrsa3/openssl-1.0.cnf
Can't open /root/easy-rsa-master/easyrsa3/pki/index.txt.attr for reading, No such file or directory
139822177810240:error:02001002:system library:fopen:No such file or directory:crypto/bio/bss_file.c:74:fopen('/root/easy-rsa-master/easyrsa3/pki/index.txt.attr','r')
139822177810240:error:2006D080:BIO routines:BIO_new_file:no such file:crypto/bio/bss_file.c:81:
Check that the request matches the signature
Signature ok
The Subject's Distinguished Name is as follows
commonName            :ASN.1 12:'server'
Certificate is to be certified until Dec 18 07:17:26 2027 GMT (3650 days)

Write out database with 1 new entries
Data Base Updated

Install kubernetes-apiserver:

[root@atomic-node1 easyrsa3]# atomic install --system --system-package=no --name kube-apiserver registry.fedoraproject.org/f27/kubernetes-apiserver

Note: Switching from the 'docker' backend to the 'ostree' backend based on the 'atomic.type' label in the image.  You can use --storage to override this behaviour.

Getting image source signatures
Skipping fetch of repeat blob sha256:04331e646521ddb577d113f3c103aef620cc4451641452c347864298669f8572
Copying blob sha256:5393e05eae68f3221008c7aa9e7677721304f64bca97c10dcf54b8a0fb7efa55
 109.75 MB / 109.75 MB [====================================================] 2s
Copying blob sha256:a53a9d8d02712e289074cf465b2d7dbe4d5620269cffa78215da2625871a7ba4
 52.08 MB / 52.08 MB [======================================================] 1s
Copying config sha256:de8dfe370000134bd1154351a6a97807195be57cdceff27c3e83b1607c8fab66
 2.98 KB / 2.98 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures
Extracting to /var/lib/containers/atomic/kube-apiserver.0
Created file /etc/kubernetes/apiserver
Created file /etc/kubernetes/config
Created file /usr/local/bin/kubectl
systemctl daemon-reload
systemd-tmpfiles --create /etc/tmpfiles.d/kube-apiserver.conf
systemctl enable kube-apiserver

Note the permissions of /etc/kubernetes

[root@atomic-node1 easyrsa3]# ls -al /etc | grep kube
drwx------.  2 root root       37 Dec 20 08:21 kubernetes

Copy certificates to /etc/kubernetes/certs

[root@atomic-node1 easyrsa3]# mkdir /etc/kubernetes/certs
[root@atomic-node1 easyrsa3]# for i in {pki/ca.crt,pki/issued/server.crt,pki/private/server.key}; do cp $i /etc/kubernetes/certs; done
[root@atomic-node1 easyrsa3]# chown -R kube:kube /etc/kubernetes/certs

Add KUBE_API_ARGS to /etc/kubernetes/apiserver

KUBE_API_ARGS="--tls-cert-file=/etc/kubernetes/certs/server.crt --tls-private-key-file=/etc/kubernetes/certs/server.key --client-ca-file=/etc/kubernetes/certs/ca.crt --service-account-key-file=/etc/kubernetes/certs/server.crt"

Run kube-apiserver

[root@atomic-node1 easyrsa3]# systemctl start kube-apiserver

See it fails

[root@atomic-node1 easyrsa3]# systemctl status kube-apiserver
● kube-apiserver.service - kubernetes-apiserver
   Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2017-12-20 08:28:44 CET; 56s ago
  Process: 1777 ExecStart=/bin/runc --systemd-cgroup run kube-apiserver (code=exited, status=1/FAILURE)
 Main PID: 1777 (code=exited, status=1/FAILURE)
      CPU: 37ms

Dec 20 08:28:43 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.
Dec 20 08:28:44 atomic-node1.local systemd[1]: kube-apiserver.service: Service hold-off time over, scheduling restart.
Dec 20 08:28:44 atomic-node1.local systemd[1]: Stopped kubernetes-apiserver.
Dec 20 08:28:44 atomic-node1.local systemd[1]: kube-apiserver.service: Start request repeated too quickly.
Dec 20 08:28:44 atomic-node1.local systemd[1]: Failed to start kubernetes-apiserver.
Dec 20 08:28:44 atomic-node1.local systemd[1]: kube-apiserver.service: Unit entered failed state.
Dec 20 08:28:44 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.

Check journalctl

Dec 20 08:28:43 atomic-node1.local runc[1777]: container_linux.go:274: starting container process caused "exec: \"/usr/bin/kube-apiserver-docker.sh\": permission denied"
Dec 20 08:28:43 atomic-node1.local systemd[1]: kube-apiserver.service: Main process exited, code=exited, status=1/FAILURE
Dec 20 08:28:43 atomic-node1.local systemd[1]: kube-apiserver.service: Unit entered failed state.
Dec 20 08:28:43 atomic-node1.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal
Dec 20 08:28:43 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.
Dec 20 08:28:44 atomic-node1.local systemd[1]: kube-apiserver.service: Service hold-off time over, scheduling restart.
Dec 20 08:28:44 atomic-node1.local audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termina
Dec 20 08:28:44 atomic-node1.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal
Dec 20 08:28:44 atomic-node1.local systemd[1]: Stopped kubernetes-apiserver.

Fix permissions of /usr/bin/kube-apiserver-docker.sh in container rootfs

[root@atomic-node1 easyrsa3]# chmod +x /var/lib/containers/atomic/kube-apiserver/rootfs/usr/bin/kube-apiserver-docker.sh

Run kube-apiserver

[root@atomic-node1 easyrsa3]# systemctl start kube-apiserver

See it fails. And check journalctl

Dec 20 08:33:14 atomic-node1.local runc[1924]: I1220 07:33:14.009558       1 server.go:112] Version: v1.7.3
Dec 20 08:33:14 atomic-node1.local runc[1924]: W1220 07:33:14.009940       1 authentication.go:368] AnonymousAuth is not allowed with the AllowAll authorizer.  Resetting AnonymousAuth to false. You should use a different authorizer
Dec 20 08:33:14 atomic-node1.local runc[1924]: unable to load server certificate: open /etc/kubernetes/certs/server.crt: permission denied
Dec 20 08:33:14 atomic-node1.local systemd[1]: kube-apiserver.service: Main process exited, code=exited, status=1/FAILURE
Dec 20 08:33:14 atomic-node1.local systemd[1]: kube-apiserver.service: Unit entered failed state.
Dec 20 08:33:14 atomic-node1.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal
Dec 20 08:33:14 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.
Dec 20 08:33:14 atomic-node1.local systemd[1]: kube-apiserver.service: Service hold-off time over, scheduling restart.
Dec 20 08:33:14 atomic-node1.local systemd[1]: Stopped kubernetes-apiserver.

Fix permissions of /etc/kubernetes

[root@atomic-node1 easyrsa3]# chmod +x /etc/kubernetes

Run kube-apiserver and see it fails

[root@atomic-node1 easyrsa3]# systemctl start kube-apiserver
Dec 20 08:34:46 atomic-node1.local runc[2049]: I1220 07:34:46.266909       1 server.go:112] Version: v1.7.3
Dec 20 08:34:46 atomic-node1.local runc[2049]: W1220 07:34:46.267281       1 authentication.go:368] AnonymousAuth is not allowed with the AllowAll authorizer.  Resetting AnonymousAuth to false. You should use a different authorizer
Dec 20 08:34:46 atomic-node1.local runc[2049]: unable to load server certificate: open /etc/kubernetes/certs/server.crt: permission denied
Dec 20 08:34:46 atomic-node1.local systemd[1]: kube-apiserver.service: Main process exited, code=exited, status=1/FAILURE
Dec 20 08:34:46 atomic-node1.local systemd[1]: kube-apiserver.service: Unit entered failed state.
Dec 20 08:34:46 atomic-node1.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal
Dec 20 08:34:46 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.
Dec 20 08:34:46 atomic-node1.local systemd[1]: kube-apiserver.service: Service hold-off time over, scheduling restart.
Dec 20 08:34:46 atomic-node1.local systemd[1]: Stopped kubernetes-apiserver.

Fix uid and gid in /var/lib/containers/atomic/kube-apiserver/config.json

        "user": {
            "uid": 996,
            "gid": 994
        },

Run kube-apiserver and see it works

[root@atomic-node1 easyrsa3]# systemctl start kube-apiserver
[root@atomic-node1 easyrsa3]# systemctl status kube-apiserver
● kube-apiserver.service - kubernetes-apiserver
   Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2017-12-20 08:37:17 CET; 58s ago
 Main PID: 2200 (runc)
    Tasks: 8 (limit: 4915)
   Memory: 6.1M
      CPU: 32ms
   CGroup: /system.slice/kube-apiserver.service
           └─2200 /bin/runc --systemd-cgroup run kube-apiserver

Change permissions of /etc/kubernetes

[root@atomic-node1 easyrsa3]# chmod -x /etc/kubernetes

Restart kube-apiserver and check journalctl

Dec 20 08:39:40 atomic-node1.local runc[2462]: I1220 07:39:40.765857       1 server.go:112] Version: v1.7.3
Dec 20 08:39:40 atomic-node1.local runc[2462]: W1220 07:39:40.766524       1 authentication.go:368] AnonymousAuth is not allowed with the AllowAll authorizer.  Resetting AnonymousAuth to false. You should use a different authorizer
Dec 20 08:39:40 atomic-node1.local runc[2462]: unable to load server certificate: open /etc/kubernetes/certs/server.crt: permission denied
Dec 20 08:39:40 atomic-node1.local systemd[1]: kube-apiserver.service: Main process exited, code=exited, status=1/FAILURE
Dec 20 08:39:40 atomic-node1.local systemd[1]: kube-apiserver.service: Unit entered failed state.
Dec 20 08:39:40 atomic-node1.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal
Dec 20 08:39:40 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.
Dec 20 08:39:41 atomic-node1.local systemd[1]: kube-apiserver.service: Service hold-off time over, scheduling restart.
Dec 20 08:39:41 atomic-node1.local systemd[1]: Stopped kubernetes-apiserver.

Conclusions:

  1. /usr/bin/kube-apiserver-docker.sh inside kube-apiserver container must have +x permissions (applies to all other components of kubernetes)
  2. /etc/kubernetes on host machine must have +x permissions 2.1 After reboot /etc/kubernetes permissions should not be reset to 700

kryachkov avatar Dec 20 '17 07:12 kryachkov

Thanks for the report @kryachkov!

@jasonbrooks is this the same issue as found in other recent issues such as https://github.com/projectatomic/atomic-system-containers/issues/156?

ashcrow avatar Dec 30 '17 02:12 ashcrow