kubo-release
kubo-release copied to clipboard
kubectl exec/port-forward don't play nice when using corporate DNS
What happened:
My kubo test cluster in Azure looks like this:
Instance Process State AZ IPs VM CID VM Type Active
master/a7637fc0-0e2f-45dd-8e01-00d7e2e51496 running z1 10.255.10.8 agent_id:6259000d-d684-4025-9c79-778c8350376d;resource_group_name:banana-env-cfcr small true
worker/088a2160-2eb9-4a2a-9d31-63ca4ee73982 running z1 10.255.10.7 agent_id:e9053429-7cef-4844-926a-cdf12de6e90d;resource_group_name:banana-env-cfcr small-highmem true
worker/e738cf13-120b-4c07-87c3-c3e86078b013 running z3 10.255.10.10 agent_id:b8d042ec-5037-4a5b-9d78-23427a06cf15;resource_group_name:banana-env-cfcr small-highmem true
worker/ef7d509b-502c-411b-8c89-8f3d55cb5259 running z2 10.255.10.9 agent_id:8e410ac7-69a4-4311-a0d7-8b4fba44e15e;resource_group_name:banana-env-cfcr small-highmem true
When attempting to kubectl exec or kubectl port-forward in the cluster, I am greeted with the server misbehaving error instead of being dropped into/forwarded to ports in a running container:
k exec -it kibana-logging-es-cluster-56d6dfcbdb-xx5fb /bin/bash
Error from server: error dialing backend: dial tcp: lookup e9053429-7cef-4844-926a-cdf12de6e90d on 192.168.140.69:53: server misbehaving
What you expected to happen:
The usual action of both those commands.
How to reproduce it (as minimally and precisely as possible):
We have built the latest kubo-release from master, including a cherry-picked commit from the following PR: https://github.com/cloudfoundry-incubator/kubo-release/pull/257
In our Azure subscription, we must use the corporate DNS server (ie. not Azure DNS) enforced on the vnet (this is corporate policy currently, and not in my control)
@andyliuliming informs me that machines created in an Azure DNS based vnet using the instance agentid as the machine name automatically, thus DNS resolution for the above commands isn't an issue in those circumstances.
I was/still am quite excited at the fact that bosh operates its own dns inside the walls of the vnet so had hoped that this would provide sufficient cover to ensure internal machine resolutions without having to go out onto the corporate dns, but unfortunately it appears bosh-dns doesn't currently resolve machines by raw agentid (a conversation about bosh-dns resolutions, including with raw agentid here: https://github.com/cloudfoundry/bosh-dns-release/issues/30).
I have been wondering whether https://bosh.io/docs/dns/#aliases might help here as a workaround, but I'd also like to get a handle on how the bosh team are feeling about supporting agentid resolutions, as I feel using bosh-dns to solve this problem seems like the right place, and would be really advantageous over other k8s deployment solutions.
Anything else we need to know?:
Environment:
- Deployment Info (
bosh -d <deployment> deployment):
Name Release(s) Stemcell(s) Config(s) Team(s)
azurecfcr bosh-dns/1.10.0 bosh-azure-hyperv-ubuntu-xenial-go_agent/0000 3 cloud/azurecfcr -
bpm/0.13.0 9 cloud/default
cfcr-etcd/1.5.0 2 runtime/dns
docker/32.1.0
kubo/0.23.0+dev.4 <-- current master + cherry-picked commit outlined above
1 deployments
The Stemcell here is the latest xenial build from latest master branch of bosh-linux-stemcell-builder
- Environment Info (
bosh -e <environment> environment):
Name bosh-banana-env
UUID f6cf854f-8767-4083-9758-f744fc2d23fb
Version 268.0.1 (00000000)
CPI azure_cpi
Features compiled_package_cache: disabled
config_server: enabled
dns: disabled
snapshots: disabled
User admin
- Kubernetes version (
kubectl version): 1.11.3 - Cloud provider (e.g.
aws,gcp,vsphere): azure
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/161722421
The labels on this github issue will be updated when the story is started.
There are three ways to fix this one @grenzr
- Customer enables DDNS registration on their DNS server for their K8s clusters (or deploys an intermediate DNS server) so that the nodes are resolvable. Each Kubo node needs to do an nsupdate for the DDNS.
This is automatic when using Azure DNS as it is tied to the DHCP on the instance
-
The BOSH dns team enables a feature where agent IDs are resolvable for bosh nodes. That’s a longer conversation.
-
You do a BOSH DNS aliases hack like my original PR.
My thinking is #1 would be the easiest.
Thanks for the feedback @svrc-pivotal , that's pretty much what I was thinking too.
I'll try 3 first I think, as I'd prefer to keep all the deployment resolutions insulated and local in that vnet. But if that doesn't work out well, I've always got option 1. in my back pocket which I know will work as have already done that during a spike of a different k8s deployment engine.
Perhaps I'll put my 2c worth in cloudfoundry/bosh-dns-release#30 as well and see where it goes, but its good you can use the dns aliases functionality already now.