k3s-ansible icon indicating copy to clipboard operation
k3s-ansible copied to clipboard

Connection resets on Armbian

Open jdmarshall opened this issue 5 years ago • 2 comments

Failing to get k3s nodes talking to each other directly, I thought I'd take crack at using the ansible playbook to make sure I wasn't missing anything.

The problem could be that I'm running on Armbian whereas this seems to be well-tested on Ubuntu and Raspbian. I'm trying to build a PR to fix discrepancies, but I haven't succeeded in a successful playbook run yet. It's hanging now at "Enable and check K3s service"

I see this error on a node:

./syslog:Aug 2 00:58:14 localhost k3s[4895]: time="2020-08-02T00:58:14.866195152Z" level=info msg="Running load balancer 127.0.0.1:43201 -> [t4.local:6443]" ./syslog:Aug 2 00:58:24 localhost k3s[4895]: time="2020-08-02T00:58:24.881040536Z" level=error msg="failed to get CA certs at https://127.0.0.1:43201/cacerts: Get https://127.0.0.1:43201/cacerts: read tcp 127.0.0.1:36796->127.0.0.1:43201: read: connection reset by peer"

Service is up on the master, and accessible from the node.

But.

It's really, really slow:

time curl --insecure https://t4.local:6443/cacerts -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE-----

real 0m10.304s user 0m0.149s sys 0m0.042s

Resets + slow service seem a bit suspect, and of half a dozen queries, they all return at just over 10 seconds. There's free memory and the load average is 0.6 on the master. They're on the same dumb switch. Don't seem to be any error logs on master during the request, but I could be looking in the wrong spots.

What am I missing, or should I be looking for?

jdmarshall avatar Aug 02 '20 02:08 jdmarshall

Partly caused by:

https://github.com/golang/go/issues/35067

as reported https://github.com/rancher/k3s/issues/2085#issuecomment-678713189

jdmarshall avatar Aug 22 '20 20:08 jdmarshall

After fixing this, I was still hanging.

TASK [k3s/node : Enable and check K3s service] ok: [some machines that were correct]

One of the nodes was still holding on to the bad DNS server. Figuring out which was a process of elimination, since ansible only reports what it has already done, not the machine it is stuck on. For anyone else reading this: I probably would have gotten more information from using the -v or -vv argument to ansible.

It might be good for the playbook to make sure that each machine can route to the others. There are other networking issues besides bad or inconsistent DNS lookups which would result in the machine running the playbook being able to see all of the nodes and masters, but they are unable to route to each other.

jdmarshall avatar Aug 23 '20 00:08 jdmarshall

As I lack knowledge around Armbian, you may want to retest this with the overhaul of this repo. Please reopen this comment if Armbian support is still broken.

Note: Replying to a 2 year old comment expecting a reply is weird.

dereknola avatar Nov 09 '23 21:11 dereknola