cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

DNS service not working for System VMs on a fresh new install Cloudstack

Open bradsmin opened this issue 2 years ago • 7 comments

ISSUE TYPE
  • Other
COMPONENT NAME
UI, System VMs
CLOUDSTACK VERSION
CloudStack 4.18
CONFIGURATION

Zone with default Advanced networking with Isolation method as VLAN Zone has only one "Physical Network 1" with all traffic types Guest, Management, Public, Storage passing through it KVM Management Server and KVM Host are same. Its a testing environment OpenVswitch and DPDK enabled and created bridge interface using OpenVswitch commands Primany and Secondary with NFS mount points

OS / ENVIRONMENT

Ubuntu 22.04.2 LTS Codename: jammy

SUMMARY

The System VMs Agent state is not Up

STEPS TO REPRODUCE
Install a fresh new Ubuntu 22.04 LTS server 
Enable OpenVswitch and dpdk over the interface and create bridge interface
Perform the install of cloudstack on a server. Also make this server as KVM Host.
Configure Cloudstack as zone with default advanced networking 
Zone has only one "Physical Network 1" with  all traffic types Guest, Management, Public, Storage 
passing through it
EXPECTED RESULTS
The System VMs Agent state is  OK
ACTUAL RESULTS
After the Cloudstack install and basic configuration of zone, 

Go system vms section, we notice the state is "running" but the  Agent state is not "OK"

Accessed one of the system vm using  commands like  ssh -i /root/.ssh/id_rsa.cloud -p 3922 root@link-local 

Issued command /usr/local/cloud/systemvm/ssvm-check.sh

Noticed error ERROR: DNS not resolving cloudstack.apache.org

Confirmed Google DNS server is there at resolv.conf

Checked the service cloud status and noticed the system vm can't communicate with port 8250 of management server 
over the management server IP

Further troubleshooting shows from inside system vm, only able to ping management server IP 
but no connection with any active ports of management server like ssh port , management ports like 8250 etc

While checking the traffic over the bridge interface using ovs-tcpdump , we have noticed the 
incoming traffic from system vms towards bridge interface are reaching but no outgoing response 
towards system vms. Below is a sample captured traffic at bridge interface over 
cloudstack port number  8250. We believe that we setup the openvswitch dpdk setup correctly 
and not sure what we are missing from our side.

12:08:52.312394 IP SVMIP.33940 > MGMTIP.8250: Flags [S], seq 3467199111, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
<.............E..4\.@.@.....?...?... :..B.........................
12:08:53.314315 IP SVMIP.33940 > MGMTIP..8250: Flags [S], seq 3467199111, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0

issued the command "ovs-ofctl dump-flows cloudbr0" and got below result. Looks like the rule is fine

cookie=0x0, duration=148279.131s, table=0, n_packets=7453119, n_bytes=2575374706, priority=0 actions=NORMAL

Not sure what we are missing.

bradsmin avatar Apr 26 '23 04:04 bradsmin

Thanks for opening your first issue here! Be sure to follow the issue template!

boring-cyborg[bot] avatar Apr 26 '23 04:04 boring-cyborg[bot]

@bradsmin As you described , it looks not like a DNS issue, but firewall issue

Further troubleshooting shows from inside system vm, only able to ping management server IP but no connection with any active ports of management server like ssh port , management ports like 8250 etc

is systemvm able to ping google DNS (8.8.8.8) ?

can you share the agent.properties on kvm host , xml dump of systemvm and output of some ovs commands ?

weizhouapache avatar Apr 26 '23 07:04 weizhouapache

Yes, can ping 8.8.8.8 from inside systemvm but not able to ping google.com

ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: icmp_seq=0 ttl=60 time=1.033 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=60 time=0.966 ms

ping google.com ping: unknown host

agent.properties content

libvirt.vif.driver=com.cloud.hypervisor.kvm.resource.OvsVifDriver cluster=1 openvswitch.dpdk.enabled=true pod=1 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource private.network.device=cloudbr0 domr.scripts.dir=scripts/network/domr/kvm openvswitch.dpdk.ovs.path=/var/run/openvswitch guest.network.device=cloudbr0 keystore.passphrase= key exists , just for safety removed it here hypervisor.type=kvm port=8250 zone=1 public.network.device=cloudbr0 local.storage.uuid= ID exists, just for safety removed it here host=HOSTIP@static guid= guid exists , just for safety removed it here LibvirtComputingResource.id=1 network.bridge.type=openvswitch workers=5 iscsi.session.cleanup.enabled=false vm.migrate.wait=3600

xml dump attached as file . Only removed IPs, VNC infos, ids, and mac address values. xmldump.txt

Output of command ovs-vsctl show

Bridge cloud0 Port vnet0 Interface vnet0 Port vnet3 Interface vnet3 Port cloud0 Interface cloud0 type: internal Bridge cloudbr0 datapath_type: netdev Port vnet4 Interface vnet4 Port vnet1 Interface vnet1 Port cloudbr0 Interface cloudbr0 type: internal Port eno2 Interface eno2 type: dpdk options: {dpdk-devargs="id"} Port vnet5 Interface vnet5 Port vnet2 Interface vnet2 ovs_version: "2.17.3"

bradsmin avatar Apr 26 '23 08:04 bradsmin

@bradsmin can you test openvswitch without dpdk ?

weizhouapache avatar Apr 28 '23 21:04 weizhouapache

Yes, already done that. With out dpdk, openvswitch and cloudstack is working fine on a fresh install.

bradsmin avatar Apr 29 '23 02:04 bradsmin

I have been experiencing the same issue. In my case (I am using linux bridge), I am able to make guest DNS working by disabling UFW on my Ubuntu KVM hosts. However, I haven't figured out a way to make UFW and guest DNS working together for better security.

li-liwen avatar Jan 02 '24 13:01 li-liwen

@bradsmin have you defined the internal dns which isn't on your management network? This is because ssvm agent puts a routing rule to route traffic to the internal zone dns via its management/private network nic.

rohityadavcloud avatar Apr 30 '24 10:04 rohityadavcloud

@bradsmin can you review the comments and advise? Have you also tried @li-liwen 's workaround to disable ufw (or firewalld).

rohityadavcloud avatar Jun 13 '24 07:06 rohityadavcloud

Yes, internal dns defined ( provided google dns ) at the time of configuration. Also tried to disable ufw firewall at KVM host. But issue still exists.

bradsmin avatar Jun 14 '24 16:06 bradsmin