besu icon indicating copy to clipboard operation
besu copied to clipboard

Discovery using Kubernetes

Open daanporon opened this issue 9 months ago • 6 comments

Description

I'm trying to get Discovery to work on a Kubernetes environment using a LoadBalancer and the Kubernetes NAT manager. And i'm experiencing multiple issues with this. So now i'm wondering if i'm doing something wrong or that it wasn't the intention to use Discovery in this way. I know the documentation says that there are limitations, but i was trying to see if i can work around those and maybe propose fixes for them.

I'm setting up my nodes using Pulumi scripts. I also have dns enabled and my nodes are accessible via a dns name.

What i do to test this.

  • I set up a "external" node on AWS or Azure without a bootnode or a static-node.
  • I set up a second node on AWS or Azure
    • using the genesis of the first node
    • and added the first node as a bootnode using a dns-based e-node uri.

How i set up a node:

  • I first deploy my node using exec /opt/besu/bin/besu --config-file=/etc/besu/config.toml --p2p-host=\${POD_IP} --Xdns-enabled=true --Xdns-update-enabled=true --nat-method NONE
  • In the config.toml i have p2p-enabled set to true and discovery-enabled set to true and a few other configs, i don't give it any bootnodes yet.
  • I have port 30303 configured for discovery and rlpx.
  • I also set up my load balancer with a dns attached to it and which exposes the 30303 as a TCP port connected to the rlpx port. And 40404 as a UDP port linked to the discovery port of my pod.
  • I have a dns name attached to my load balancer.
  • From the moment my load balancer becomes available (so the dns resolves) i reload my deployment and start up the node with /opt/besu/bin/besu --config-file=/etc/besu/config.toml --p2p-host=\${POD_IP} --Xdns-enabled=true --Xdns-update-enabled=true --nat-method KUBERNETES --Xnat-method-fallback-enabled=false --Xnat-kube-service-name=THE_SERVICE_NAME --bootnodes BOOTNODE_A, BOOTNODE_B

Things i already figured out:

  • The reason why i first boot without a nat-method and only from the moment my load-balancer is up-and-running with nat-method KUBERNETES and the bootnodes is because i don't want the node to start talking to the bootnodes when it doesn't have it's correct configuration. If it also starts discovery without the KUBERNETES NAT configuration it will start communicating with the other nodes and they will add the node to the bonding nodes cache. From the moment this happens the other nodes will never talk to my node on the right discovery port. By waiting and only start communicating with the other nodes from the moment it knows it's own discovery port from the load balancer "everything" should work out fine.
  • There was also some issue in the PING packet that it didn't take the NAT port mapping into account, but this should be fixed by this PR 6578.

Issues i'm now still experiencing:

I have been testing this on both Azure and on AWS, because GKE doesn't support these kind of mixed-loadbalancers right now.

  • On Azure most of it seems to work, i only have an issue with the messages being sent after the initial PING. So my fix in PR 6578 will fix that the receiving node knows which ip and port it needs to connect to to communicate back to my node. But the subsequent messages don't send a From field in the PacketData and because of this the PeerDiscoveryAgent#deriveHost will use the source host and source port, which will break PeerDiscoveryController#resolvePeer, because the node will be found but the endpoint will not match. I tried this by disabling this filter and then everything works out correctly. I don't know if removing this filter is a good idea, but maybe we need to add a From endpoint to the other PacketData as well, so that this will indeed match.
  • On AWS i have another issue. I have the feeling the LoadBalancers on AWS work a bit differently, they are assigned a DNS name in stead of an external-ip. And what i see happening is that the PING packet from my node will reach the external node. But when the external node PONG's back, using the right ip and port from the From endpoint in the packetData, it doesn't always reach my node. My guess is that the ip address of the load balancer is shared ... and it doesn't always know how to direct the traffic. I sometimes saw that after a few attempts the PONG packet is received, but it's definitely not always the case. I don't know how i can solve this, the only "valable" solution i can think of is using the dns names here as well. Because i use a dns-based e-node url in the bootnodes configuration to connect to the external node and that seems to always work correctly, i always see the PING's reaching that external node. But my node will identify itself in the From field of the PingPacketData using the ip-address and the port mappings it found in the Kubernetes NAT manager. And i don't think this ip-address is good to direct the traffic to. Maybe a xdns flag to set the hostname of the node or somekind of a Fixed NAT manager using DNS'ses would be a good solution for my use-cases? I think an xdns flag would make most sense, wdyt?

Possible fixes:

  • Implement an xdns-domain-name feature which you can use to register the domain name of your node, so that we use this in discovery. This will help with the AWS feature load balancer issue.
  • Keep sending the From data in the PacketData so that it never falls-back to the source of the packet.

daanporon avatar Apr 29 '24 15:04 daanporon

Keep sending the From data in the PacketData so that it never falls-back to the source of the packet.

I don't think we need to do this for every type of packet, but only the ones where you are the initiator. So i think PING, ENR_REQUEST and FIND_NEIGHBOURS?

daanporon avatar May 16 '24 10:05 daanporon

Would be good to get some feedback, so maybe i can try to implement those changes ... but i'm not familiar enough with all of the code to make sure it are acceptable solutions.

daanporon avatar May 16 '24 10:05 daanporon

Maybe in stead of xdns-domain-name we should allow p2p-host to be a domain-name if xdns-enabled=true?

daanporon avatar May 16 '24 15:05 daanporon

Hi @daanporon I think this is a really good idea. For AWS though, you don't specifically need a loadbalancer and can skip that part out. I've got a PR https://github.com/hyperledger/besu-docs/pull/1597 here you can use which makes use of ec2 instances directly to establish connectivity. Haven't found an equivalent for Azure yet so this would be a good solution. @matkt is the best person to ask about the NAT manager

joshuafernandes avatar May 17 '24 02:05 joshuafernandes

We now also did something similar, using NodePorts services and using the ip address of the nodes where are containers are hosted on. Tested this in AWS and GKE and seemed to work find. We are using kubectl get nodes to fetch the external ip of the node, which can be used generically across cloud providers.

daanporon avatar Jun 07 '24 13:06 daanporon

Good to hear @daanporon ! I''m working on some charts for besu and teku that can be used with the above implementation that I'll make available soon. nodeport is fine too for one/few nodes, but if you have many you can't reuse the same service across host nodes as there is port contention. I've used the clusterip to overcome that and kept the RBAC of the pods to absolute least privileges. Either way though am happy you have a working solution :)

The cloud providers also use a metadata service to return the IP so that is another option (well I know AWS and Azure do, and I think GKE do the same as well)

joshuafernandes avatar Jun 09 '24 21:06 joshuafernandes