bugs icon indicating copy to clipboard operation
bugs copied to clipboard

Ignition timeout when PXE booting with LACP connected to Cisco Nexus 5k.

Open BillSchumacher opened this issue 6 years ago • 16 comments

Issue Report

Ignition fails to wait for network to become ready.

This is problematic for an operator with hosts connected to Cisco Nexus 5k ports configured for LACP as the ports must negotiate, even with fast timers and port fast enabled it takes around 70 seconds for the interfaces to receive an IP address.

The problem is that the server is not attempting to negotiate LACP until it can be configured to do so, thus you must wait for LACP to timeout.

Remove LACP and everything works as expected, kind of. The PXE boot guide leaves a lot to be desired, locksmith errors aside it works.

Bug

In less than 62 seconds, the fastest I've seen an IP address acquired in a bonded configuration, Ignition has already given up and reboots the system which is a painful 5 minute wait while troubleshooting this issue btw.

Container Linux Version

Container Linux 1911.4.0.

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
...
BUG_REPORT_URL="https://issues.coreos.com"

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux? Baremetal, Supermicro server, with Intel nics.

Expected Behavior

If it would wait the 90 seconds which is supposedly the default, as I seen in another issue that I could not find again, there would be no issue or wait until the host has an IP address.

Actual Behavior

Complete craziness, knowing that it has no IP address and no network connectivity it repeatedly attempts to download a file which in fact requires a network connection and gives up very shortly before it is possible, like 5-10 seconds or so.

Reproduction Steps

  1. Connect two interfaces to a Cisco 5k with channel-group mode active configured. int eth1/1 lacp rate fast channel-group 100 mode active

interface port-channel100 no lacp graceful-convergence spanning-tree port type edge

You could trying messing with the switch config but no combination of things makes it work, lacp rate fast and no graceful-convergence with spanning tree set to edge buys you 10 seconds or so.

  1. Setup PXE server, ensure it works.
  2. Wait for said issue after attempting to boot.

Other Information

I also attempted to set the IP address in the kernel boot options but the system halted because "the Transport endpoint is not connected". Probably because of LACP negotiation but who knows. This may not be an issue in your "development" environment but in real world metal cases it is definitely a problem.

BillSchumacher avatar Nov 30 '18 01:11 BillSchumacher

A possible fix would be to add a kernel option to specify interfaces that should be bonded.

Which would allow for proper LACP negotiation on startup and avoid the LACP timeout problem.

BillSchumacher avatar Nov 30 '18 02:11 BillSchumacher

Some more details, I think that it thinks that the interface gained IPv6 connectivity when it did not.

[ 2.373696] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready It's around this time that the download first tries to take place.

BillSchumacher avatar Nov 30 '18 02:11 BillSchumacher

I suppose a custom oem would probably also solve this, should have scrolled to the bottom before spending hours troubleshooting this.

BillSchumacher avatar Nov 30 '18 03:11 BillSchumacher

Custom OEM is the way to go if you're having this issue as well. This should probably still be addressed and/or the documentation should be updated.

BillSchumacher avatar Nov 30 '18 03:11 BillSchumacher

If it's fetching things over http, you can adjust the timeouts in the ignition.timeouts section. Would that fix your problem or is it timing out fetching the initial config?

ajeddeloh avatar Nov 30 '18 19:11 ajeddeloh

It is timing out fetching the initial config.

BillSchumacher avatar Nov 30 '18 23:11 BillSchumacher

As an (ugly) workaround you could use a data url for the initial config (assuming you're using coreos.config.url on the kernel command line) to just be a stub config that sets the timeouts and fetches the real config using the ignition.config.replace section.

Ignition on CL depends on network.target. What the network being up means is somewhat nebulous (see https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/). This is why we retry instead of trying to detect if network is up.

It's hard to determine what a "reasonable" timeout is. I'd agree that ~60s (what it currently is) is perhaps a little short. Do you think increasing it to ~120s would fix your problem?

ajeddeloh avatar Dec 01 '18 00:12 ajeddeloh

90 seconds would be sufficient, 120 would definitely fix it.

It takes 30 seconds for LACP to timeout, spanning tree hello and forwarding takes around 17 seconds without port fast enabled and then the DHCP client has to do its thing.

I'm unsure but I believe spanning tree has to do it's cycle once more after the LACP timeout. Since the port falls back into an individual state, could be wrong though I haven't gone that far into it.

BillSchumacher avatar Dec 01 '18 05:12 BillSchumacher

I'll time the Intel pxe boot agent to see what it uses, they probably have it figured out pretty good.

BillSchumacher avatar Dec 01 '18 05:12 BillSchumacher

They are using ~73 seconds, probably 75.

BillSchumacher avatar Dec 01 '18 05:12 BillSchumacher

With port-fast it I get an IP at ~62 seconds and without ~72 seconds, the Intel agent always works though so maybe 75 is the magic number.

BillSchumacher avatar Dec 01 '18 05:12 BillSchumacher

it repeatedly attempts to download a file which in fact requires a network connection and gives up very shortly before it is possible, like 5-10 seconds or so.

@CyrusTheVirusG can you please attach the specific log entries showing this (and the whole boot logs if possible)? This doesn't match my expectations and I'm worried there may be something else going on.

@ajeddeloh which 60s timeout are you thinking about?

lucab avatar Dec 09 '18 21:12 lucab

The is the journal file before tampering with the boot, basically I added nsswitch.conf/resolve.conf to resolve the name servers, because it initially failed with not finding a nameserver. I was completely unaware that the interfaces did not come up. So the "error" is not find the nameserver at this point, but still the root cause is the the interfaces are not up and the system gives up too early. Journal.txt attached.

journal.txt

DirkTheDaring avatar Dec 09 '18 21:12 DirkTheDaring

also some additional topic, i mount also an ca certificate. that is the only thing i always mounted, as i need my self signed ca, in order to make ssl work. This used to work about 8 month ago.
Exact processt is: Create a /etc/sssl/certs/mycert.pem and put it into a "cert.cpio.gz", which gets mounted during boot. This used to work on my machine (TM) :)

DirkTheDaring avatar Dec 09 '18 22:12 DirkTheDaring

Ah, nevermind, I was mixing the client HTTP total timeout (which is unlimited) with the config fetching timeout (which is one minute).

lucab avatar Dec 09 '18 22:12 lucab

Err, its actually not one minute, it's 10.1 + 10.2 + 10.4 + 10.8 + 11.6 + 13.2 = 65.5 seconds (although the last request would start at t=52.3s). Each http attempt times out after 10 seconds then there is a backoff between attempts that exponentially increases. This really ought to be simplified since the exponential backoff is dwarfed by http timeout, but for now we could probably increase the max backoff to 30 sec or something similar.

ajeddeloh avatar Dec 10 '18 18:12 ajeddeloh