ansible-for-kubernetes icon indicating copy to clipboard operation
ansible-for-kubernetes copied to clipboard

Work along issues & notes (eBook version 2020-09-01)

Open bonsi opened this issue 3 years ago • 1 comments

Posting issues, notes & suggestions as I'm working along through the book:

My system:

❯ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.2 LTS
Release:	20.04
Codename:	focal
❯ minikube version
minikube version: v1.18.0
commit: ec61815d60f66a6e4f6353030a40b12362557caa-dirty

Chapter 1

On Ubuntu, minikube start now uses "docker" as the default driver, and not "virtualbox" which you imply in the sections afterwards. To use "virtualbox" as the driver, one would need to run minikube start --driver=virtualbox. On subsequent minikube starts, the selected driver is saved in ~/.minikube and doesn't need to be specified anymore.

Chapter 2

Section: Installing Ansible

  • pip and python-dev are not longer available through the default repo's. The default python version for Ubuntu/Debian is currently v3.8 and thus the commands need to be replaced by pip3 and python3-dev
  • although you use Ansible v2.9.13 in the current version of the book, there's no mention of which Ansible versions would work (since v3.x was released not too long ago)

Chapter 3

Writing a Playbook to Build a Container Image

  • You pin the Solr version to "8.6.2" in vars/main.yml but then use version "8.3.1" later on in "Writing a Playbook to Test the Container Image" where you use docker run -d -p 8983:8983 ansible-for-kubernetes/solr:8.3.1 to start the container

Chapter 4

A Vagrantfile for local Infrastructure-as-Code

  • No mention of the project directory name to use (cluster-local-vms), as you did in the previous chapters (although it's mentioned at the end of the chapter)

Running the cluster build playbook

  • Running ansible-playbook -i inventory main.yml fails at TASK [geerlingguy.docker : Ensure dependencies are installed.] with the error:
fatal: [kube2]: FAILED! => {"cache_update_time": 1604516116, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"      install 'apt-transport-https' 'gnupg2'' failed: E: Failed to fetch http://security.debian.org/debian-security/pool/updates/main/a/apt/apt-transport-https_1.8.2.1_all.deb  404  Not Found [IP: 199.232.138.132 80]\nE: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?\n", "rc": 100, "stderr": "E: Failed to fetch http://security.debian.org/debian-security/pool/updates/main/a/apt/apt-transport-https_1.8.2.1_all.deb  404  Not Found [IP: 199.232.138.132 80]\nE: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?\n", "stderr_lines": ["E: Failed to fetch http://security.debian.org/debian-security/pool/updates/main/a/apt/apt-transport-https_1.8.2.1_all.deb  404  Not Found [IP: 199.232.138.132 80]", "E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?"], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nThe following NEW packages will be installed:\n  apt-transport-https gnupg2\n0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.\nNeed to get 542 kB of archives.\nAfter this operation, 566 kB of additional disk space will be used.\nIgn:1 http://httpredir.debian.org/debian buster/main amd64 apt-transport-https all 1.8.2.1\nGet:2 http://httpredir.debian.org/debian buster/main amd64 gnupg2 all 2.2.12-1+deb10u1 [393 kB]\nErr:1 http://httpredir.debian.org/debian buster/main amd64 apt-transport-https all 1.8.2.1\n  404  Not Found [IP: 199.232.138.132 80]\nFetched 393 kB in 5s (75.7 kB/s)\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "The following NEW packages will be installed:", "  apt-transport-https gnupg2", "0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.", "Need to get 542 kB of archives.", "After this operation, 566 kB of additional disk space will be used.", "Ign:1 http://httpredir.debian.org/debian buster/main amd64 apt-transport-https all 1.8.2.1", "Get:2 http://httpredir.debian.org/debian buster/main amd64 gnupg2 all 2.2.12-1+deb10u1 [393 kB]", "Err:1 http://httpredir.debian.org/debian buster/main amd64 apt-transport-https all 1.8.2.1", "  404  Not Found [IP: 199.232.138.132 80]", "Fetched 393 kB in 5s (75.7 kB/s)"]}

for each host (kube1, kube2, kube3). The error is probably related to the apt cache age of the used Vagrant boxed because I was able to fix it by running sudo apt update on each box (or Ansible-style: ansible -m command -a 'sudo apt update' -i inventory all. Running the playbook again then succeeds. An even better solution is to add the following to the pre_tasks of the playbook:

    - name: Fix stale APT cache.
      apt:
        update_cache: yes

Testing the cluster with a deployment using Ansible

  • TASK [Create hello-k8s resources and wait until they are Ready.] fails with error
failed: [kube1] (item=hello-k8s-deployment.yml) => {"ansible_loop_var": "item", "changed": false, "item": "hello-k8s-deployment.yml", "msg": "Failed to get client due to HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fbf6d942350>: Failed to establish a new connection: [Errno 111] Connection refused',))"}
failed: [kube1] (item=hello-k8s-service.yml) => {"ansible_loop_var": "item", "changed": false, "item": "hello-k8s-service.yml", "msg": "Failed to get client due to HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2114a50250>: Failed to establish a new connection: [Errno 111] Connection refused',))"}

Fixed this issue by downgrading/pinning the openshift version in the test-deployment.yml playbook to v0.11.2 as outlined in this issue.

Patching Flannel to use the right network interface

  • The (rather) short hint on how to create a patch could use some more explanation like specifying the exact command to use (e.g. diff -u kube-flannel.yml kube-flannel-virtualbox.yml > kube-flannel-patch.txt)
  • The flannel DaemonSet is no longer called kube-flannel-ds-amd64 but kube-flannel-ds

Chapter 5

Authenticating to the EKS Cluster via kubeconfig

The aws-iam-authenticator is no longer required when using aws-cli version 1.16.156 or later

Chapter 6

(TODO in book)

Chapter 7

Manage Kind with Molecule

The default YAML files generated with molecule init scenario seem to have changed (quite a bit). Following along and making changes as suggested result in molecule test throwing an error:

TASK [Gathering Facts] *********************************************************
fatal: [molecule-test]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname molecule-test: Name or service not known", "unreachable": true}

I suspect this has to do with the absent connection: local in converge.yml because after I modified converge.yml as outlined in the next subchapter of the book ("Test a playbook in Kind with Molecule"), the error went away.

Test a playbook in Kind with Molecule

molecule converge successfully runs (docker ps shows me a running container kindest/node:v1.20.2) but if I run kubectl get job hello after that, I get an error:

The connection to the server localhost:8080 was refused - did you specify the right host or port?

Seems like the new kubeconfig (~/.kube/config-molecule-test) is not being used. Solved that by running export KUBECONFIG=~/.kube/config-molecule-test

Kubernetes CI Testing in GitHub Actions

  • You start off with the filename molecule-kind.yml and later in the subchapter you say "Once you have the ci.yml workflow file added to your repository"

Groetjes, Ivo

bonsi avatar Mar 21 '21 07:03 bonsi

I would add that for chapter 2 I encountered issues running the playbook, specifically:

TASK [Create a Deployment for Hello Go.] *********************************************** fatal: [127.0.0.1]: FAILED! => {"changed": false, "msg": "Failed to get client due to HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4f9c9daf40>: Failed to establish a new connection: [Errno 111] Connection refused'))"}

Tracked the issue down to my version of ansible installed, described here: https://github.com/kubernetes-client/python/issues/1333

Solution was to update my install of ansible:

sudo apt unstall ansible
sudo add-apt-repository --yes --update ppa:ansible/ansible
sudo apt update
sudo apt install ansible

After updating, things worked.

blakethepatton avatar May 19 '22 19:05 blakethepatton