coreos-kubernetes
coreos-kubernetes copied to clipboard
Controller and worker automated installation scripts are missing etcd certificates location
Hi. I have 2 etcd2 servers configured with tls certificates. And while trying to configure the controller or worker script to work with these servers, there is no place to configure the tls certificates location which causes the script to get into a loop of connection rotation between my etcd2 servers.
Talking about the scripts located at https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/generic
Am I right on my assumptions? If I am... I don't mind patching the script myself and adding the relevant code to support the required certificates, and then I'll upload a patch to this issue. Just want to know that I'm on the right track.
More information at http://stackoverflow.com/questions/39743901/installing-kubernetes-on-coreos-with-rkt-and-automated-script
Thanks!
I think you may have to look in more than one place to fix it but ,for example, flannel needs these values set to work with a secure etcd cluster:
FLANNELD_ETCD_ENDPOINTS=$ETCD_ENDPOINTS
FLANNELD_ETCD_CAFILE=/path_to_ca_file
FLANNELD_ETCD_CERTFILE=/path_to_cert_file
FLANNELD_ETCD_KEYFILE=/path_to_key_file
in controller at line 759
Hi. I updated controller-install.sh script with the following new environment variables:
tls cert
ETCD_CERT_FILE="/etc/ssl/etcd/etcd1.pem" ETCD_KEY_FILE="/etc/ssl/etcd/etcd1-key.pem" ETCD_TRUSTED_CA_FILE="/etc/ssl/etcd/ca.pem"
enable/disable etcd tls certificates support
ETCD_CLIENT_CERT_AUTH=true
re-create all configuration file even if they exist
OVERWRITE_ALL_FILES=true
i remember calico needs a hostname based certs
CONTROLLER_HOSTNAME="coreos-2.tux-in.com"
for mountpoint in kubelet
ETCD_CERT_ROOT_DIR="/etc/ssl/etcd"
so.. I'm new to kubernetes. I'd love some feedback. gonna work on worker script now.
I noticed that kube-policy-controller fails. trying to find out why.
Related issue: https://github.com/coreos/coreos-kubernetes/issues/694
@kfirufk I have fixed this for kube-aws in my case. Please check the changes in the attached script. Possibly that you have to mount certs for kubelet also. I have all the certs, including etcd, in "/etc/kubernetes".
@camilb - hi. i checked your script, in the kube-policy-controller manifest creation, you specified the location of the etcd ssl certificate files, but you did not create a mount point for it (/etc/etcd/ssl) so it will be available in the container. that's where I'm stuck. how did it work without it ?
@kfirufk Forgot to update the path there, here is my working config:
- path: /etc/kubernetes/manifests/kube-controller-manager.yaml
content: |
apiVersion: v1
kind: Pod
metadata:
name: kube-controller-manager
namespace: kube-system
spec:
containers:
- name: kube-controller-manager
image: {{.HyperkubeImageRepo}}:{{.K8sVer}}
command:
- /hyperkube
- controller-manager
- --master=http://127.0.0.1:8080
- --leader-elect=true
- --service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem
- --root-ca-file=/etc/kubernetes/ssl/ca.pem
- --cloud-provider=aws
resources:
requests:
cpu: 200m
livenessProbe:
httpGet:
host: 127.0.0.1
path: /healthz
port: 10252
initialDelaySeconds: 15
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/ssl
name: ssl-certs-kubernetes
readOnly: true
- mountPath: /etc/ssl/certs
name: ssl-certs-host
readOnly: true
hostNetwork: true
volumes:
- hostPath:
path: /etc/kubernetes/ssl
name: ssl-certs-kubernetes
- hostPath:
path: /usr/share/ca-certificates
name: ssl-certs-host
You can check the controller configuration here: https://github.com/camilb/coreos-kubernetes/blob/1.4.0-ha/multi-node/aws/pkg/config/templates/cloud-config-controller
Some paths are different. Also this is for "kube-aws" tool.
@camilb - that's actually what I did more or less, created a volume and a mount point, but after that the kube-controller-manager won't even try to start up. maybe i missed some indentation, or typo.. i'll diff the two and see what's different. thanks
@kfirufk also check the hyperkube version. I think "v1.4.0_coreos.0" is missing the CNI binaries. Try using v1.4.0_coreos.2
@camilb - thanks a lot. updated versions and fixed typos and now it works. just please confirm that these are all the rkt containers that needs to be up and running: kube-apiserver,kube-proxy,hyperkube,kube-policy-controller,leader-elector,kube-controller-manager,kube-scheduler.
anyhow attaching the latest version.
hi! :) so first.. i had some static certificates in the controller installer so fixed it. and also did the worker installer script. didn't fully test it but it seems ok for now.
ok.. the worker-install script starts kube-proxy and hyperkube containers. but 'kubectl get node' shows only the master node, not the worker. working on it.
ok i was wrong. the worker script is fine. i provided CONTROLLER_ENDPOINT with a domain without 'https://' prefix.
welp. I already had calico-node configured, so didn't notice it's in the script as well. fixed calico node to work with etcd tls certificates. still didn't verify all the pods functionality but this is what I have so far:
decided to create a github project instead of posting here whenever I modify the script. https://github.com/kfirufk/coreos-kubernetes-multi-node-generic-install-script
If you fork this repo, and create a pull request, the differences can easily be seen/reviewed and possibly merged into this repo (a lot easier than sharing zip files, or in an entirely separate project): https://help.github.com/articles/about-pull-requests
thanks for the info :) i'll be sure to do so next time. since each etcd2 server has it's own certificate, and I don't really want to force kubernetes to install pods on specific servers, it complicates things.. i need to store the certificates of all the etcd2 servers in the same directory of each server. so instead... I just opened another etcd2 port, http://127.0.0.1:4001 and each service in the same server can use etcd2 by going to this address without the need for a certificate.