besu
besu copied to clipboard
Discovery using Kubernetes
Description
I'm trying to get Discovery to work on a Kubernetes environment using a LoadBalancer and the Kubernetes NAT manager. And i'm experiencing multiple issues with this. So now i'm wondering if i'm doing something wrong or that it wasn't the intention to use Discovery in this way. I know the documentation says that there are limitations, but i was trying to see if i can work around those and maybe propose fixes for them.
I'm setting up my nodes using Pulumi scripts. I also have dns enabled and my nodes are accessible via a dns name.
What i do to test this.
- I set up a "external" node on AWS or Azure without a bootnode or a static-node.
- I set up a second node on AWS or Azure
- using the genesis of the first node
- and added the first node as a bootnode using a dns-based e-node uri.
How i set up a node:
- I first deploy my node using
exec /opt/besu/bin/besu --config-file=/etc/besu/config.toml --p2p-host=\${POD_IP} --Xdns-enabled=true --Xdns-update-enabled=true --nat-method NONE
- In the config.toml i have
p2p-enabled
set to true anddiscovery-enabled
set to true and a few other configs, i don't give it any bootnodes yet. - I have port
30303
configured fordiscovery
andrlpx
. - I also set up my load balancer with a dns attached to it and which exposes the
30303
as aTCP
port connected to therlpx
port. And40404
as aUDP
port linked to thediscovery
port of my pod. - I have a dns name attached to my load balancer.
- From the moment my load balancer becomes available (so the dns resolves) i reload my deployment and start up the node with
/opt/besu/bin/besu --config-file=/etc/besu/config.toml --p2p-host=\${POD_IP} --Xdns-enabled=true --Xdns-update-enabled=true --nat-method KUBERNETES --Xnat-method-fallback-enabled=false --Xnat-kube-service-name=THE_SERVICE_NAME --bootnodes BOOTNODE_A, BOOTNODE_B
Things i already figured out:
- The reason why i first boot without a nat-method and only from the moment my load-balancer is up-and-running with nat-method KUBERNETES and the bootnodes is because i don't want the node to start talking to the bootnodes when it doesn't have it's correct configuration. If it also starts discovery without the KUBERNETES NAT configuration it will start communicating with the other nodes and they will add the node to the bonding nodes cache. From the moment this happens the other nodes will never talk to my node on the right discovery port. By waiting and only start communicating with the other nodes from the moment it knows it's own discovery port from the load balancer "everything" should work out fine.
- There was also some issue in the
PING
packet that it didn't take the NAT port mapping into account, but this should be fixed by this PR 6578.
Issues i'm now still experiencing:
I have been testing this on both Azure and on AWS, because GKE doesn't support these kind of mixed-loadbalancers right now.
- On Azure most of it seems to work, i only have an issue with the messages being sent after the initial
PING
. So my fix in PR 6578 will fix that the receiving node knows which ip and port it needs to connect to to communicate back to my node. But the subsequent messages don't send aFrom
field in thePacketData
and because of this the PeerDiscoveryAgent#deriveHost will use the source host and source port, which will break PeerDiscoveryController#resolvePeer, because the node will be found but the endpoint will not match. I tried this by disabling this filter and then everything works out correctly. I don't know if removing this filter is a good idea, but maybe we need to add aFrom
endpoint to the otherPacketData
as well, so that this will indeed match. - On AWS i have another issue. I have the feeling the LoadBalancers on AWS work a bit differently, they are assigned a DNS name in stead of an external-ip. And what i see happening is that the
PING
packet from my node will reach the external node. But when the external nodePONG
's back, using the right ip and port from theFrom
endpoint in thepacketData
, it doesn't always reach my node. My guess is that the ip address of the load balancer is shared ... and it doesn't always know how to direct the traffic. I sometimes saw that after a few attempts thePONG
packet is received, but it's definitely not always the case. I don't know how i can solve this, the only "valable" solution i can think of is using the dns names here as well. Because i use a dns-based e-node url in the bootnodes configuration to connect to the external node and that seems to always work correctly, i always see thePING
's reaching that external node. But my node will identify itself in theFrom
field of thePingPacketData
using the ip-address and the port mappings it found in the Kubernetes NAT manager. And i don't think this ip-address is good to direct the traffic to. Maybe a xdns flag to set the hostname of the node or somekind of a Fixed NAT manager using DNS'ses would be a good solution for my use-cases? I think an xdns flag would make most sense, wdyt?
Possible fixes:
- Implement an
xdns-domain-name
feature which you can use to register the domain name of your node, so that we use this in discovery. This will help with the AWS feature load balancer issue. - Keep sending the
From
data in thePacketData
so that it never falls-back to the source of the packet.
Keep sending the From data in the PacketData so that it never falls-back to the source of the packet.
I don't think we need to do this for every type of packet, but only the ones where you are the initiator. So i think PING, ENR_REQUEST and FIND_NEIGHBOURS?
Would be good to get some feedback, so maybe i can try to implement those changes ... but i'm not familiar enough with all of the code to make sure it are acceptable solutions.
Maybe in stead of xdns-domain-name
we should allow p2p-host
to be a domain-name if xdns-enabled=true
?
Hi @daanporon I think this is a really good idea. For AWS though, you don't specifically need a loadbalancer and can skip that part out. I've got a PR https://github.com/hyperledger/besu-docs/pull/1597 here you can use which makes use of ec2 instances directly to establish connectivity. Haven't found an equivalent for Azure yet so this would be a good solution. @matkt is the best person to ask about the NAT manager
We now also did something similar, using NodePorts services and using the ip address of the nodes where are containers are hosted on. Tested this in AWS and GKE and seemed to work find. We are using kubectl get nodes
to fetch the external ip of the node, which can be used generically across cloud providers.
Good to hear @daanporon ! I''m working on some charts for besu and teku that can be used with the above implementation that I'll make available soon. nodeport
is fine too for one/few nodes, but if you have many you can't reuse the same service across host nodes as there is port contention. I've used the clusterip
to overcome that and kept the RBAC of the pods to absolute least privileges. Either way though am happy you have a working solution :)
The cloud providers also use a metadata service to return the IP so that is another option (well I know AWS and Azure do, and I think GKE do the same as well)