/opt/azure/containers/provision.sh needs to respect DNS failover mechanism
Hi team, I am testing DNS failover during AKS creation. An unreachable IP is set as my primary DNS server of AKS and the secondary DNS server is good.
Then the CSE of VMSS failed to be provisioned.
Error messages: Enable failed: failed to execute command: command terminated with exit status=124 [stdout] { "ExitCode": "124", "Output": "ookup deaaks901-deaaks-c4ab-hu74r11b.hcp.eastasia.azmk8s.io\n++ '[' 60 -eq 100 ']'\n++ sleep 1\n++ for i in $(seq 1 $retries)\n++ timeout 10 nslookup deaaks901-deaaks-c4ab-hu74r11b.hcp.eastasia.azmk8s.io\n++ '[' 61 -eq 100 ']'\n++ sleep 1\n++ for i in $(seq 1 $retries)\n++ timeout 10 nslookup deaaks901-deaaks-c4ab-hu74r11b.hcp.eastasia.azmk8s.io\n++'[' 79 -eq 100 ']'\n++ sleep 1\n++ for i in $(seq 1 $retries)\n++ timeout 10 nslookup deaaks901-deaaks-c4ab-hu74r11b.hcp.eastasia.azmk8s.io\n++
After some investigation, we found the issue may relate to the script /opt/azure/containers/provision.sh
It seems that the timeout setting in this script doesn't respect the DNS failover mechanism because base on our tests, the nslookup needs 15 seconds to complete the failover.
test command: time nslookup google.com
Could you please extend the timeout period of nslookup in this script to 20 sec? it will give more enhancement for this project. Thank you!
we adjusted the params - https://github.com/Azure/AgentBaker/blob/b8a0752b678aef5c7d2c472f96659294cf7a730b/parts/linux/cloud-init/artifacts/cse_main.sh#L320
does that work for you?
I believe this was fixed, feel free to re-open if not