cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Virtual machine on different hosts but in same isolated network not able to communicate (can't ping each other).

Open akshaybachhav opened this issue 1 year ago • 13 comments

ISSUE TYPE
  • Bug Report
  • Other
COMPONENT NAME
VR and instances
CLOUDSTACK VERSION
4.19.0.1
CONFIGURATION
   management server ip address : 10.0.1.10
   gateway : 10.0.1.1
   host machine 1 ip : 10.0.1.18
   host machine 2 ip : 10.0.1.12
   guest network ip range : 10.1.1.0/24
   virtual router public ip address : 10.0.1.32

   virtual machine on host machine 1(10.0.1.18) : 10.1.1.96
   virtual machine on host machine 2(10.0.1.12) : 10.1.1.158
  hypervisor type : KVM

image image

OS / ENVIRONMENT
kvm setup on ubuntu 22.04
SUMMARY
We setup new cloudstack on ubuntu 22.04 with advanced networking. Added two hosts in the same zone, same pod and same cluster successfully. later created isolated network and using ubuntu iso template created two virtual machines one on each hosts.
later we are trying to ping from one vm to another, it says network unreachable. 

image

EXPECTED RESULTS
ubuntu-dev-vm-1 should be able to ping the virtual  machine >>>>> ubuntu-dev-vm-2
ACTUAL RESULTS
PING 10.1.1.158 (10.1.1.158) 56(84) bytes of data.
From 10.1.1.96 icmp_seq=1 Destination Host Unreachable

Screenshots

virtual router is running on the host 1 machine. when we run diagnostics to ping from virtual router to the VM(10.1.1.158) the says network unreachable. image

On one host machine I have created the vm and that also have virtual router, for this case the router is able to ping the same VM(10.1.1.96). image

akshaybachhav avatar May 02 '24 11:05 akshaybachhav

Thanks for opening your first issue here! Be sure to follow the issue template!

boring-cyborg[bot] avatar May 02 '24 11:05 boring-cyborg[bot]

@akshaybachhav can you check iptables/firewall rules in the VM? Can you ping the VR from the VMs? You can try tcpdump on icmp across VMs, and hosts to see what's failing/dropping the packets.

rohityadavcloud avatar May 02 '24 11:05 rohityadavcloud

@akshaybachhav can you check iptables/firewall rules in the VM? Can you ping the VR from the VMs? You can try tcpdump on icmp across VMs, and hosts to see what's failing/dropping the packets.

+1 please also check the firewall rules on the kvm hosts.

weizhouapache avatar May 02 '24 12:05 weizhouapache

@akshaybachhav can you check iptables/firewall rules in the VM? Can you ping the VR from the VMs? You can try tcpdump on icmp across VMs, and hosts to see what's failing/dropping the packets.

Thanks for your help. We checked the iptables/firewall rules and found below status. It is allowing all traffic by default. image

tcpdump command on vm which is on same host as on which VR is present gives below results:

04:54:50.697249 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7368836:7369064, ack 4249, win 1002, options [nop,nop,TS val 718803065 ecr 1773071291], length 228 04:54:50.697511 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7369064:7369468, ack 4249, win 1002, options [nop,nop,TS val 718803066 ecr 1773071291], length 404 04:54:50.697741 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7369468:7369696, ack 4249, win 1002, options [nop,nop,TS val 718803066 ecr 1773071291], length 228 04:54:50.697852 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7369064, win 7689, options [nop,nop,TS val 1773071291 ecr 718803065], length 0 04:54:50.698049 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7369696:7369732, ack 4249, win 1002, options [nop,nop,TS val 718803066 ecr 1773071291], length 36 04:54:50.698306 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7369696, win 7689, options [nop,nop,TS val 1773071292 ecr 718803066], length 0 04:54:50.698437 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7369732:7370512, ack 4249, win 1002, options [nop,nop,TS val 718803067 ecr 1773071292], length 780 04:54:50.698664 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7370512:7370724, ack 4249, win 1002, options [nop,nop,TS val 718803067 ecr 1773071292], length 212 04:54:50.698896 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7370724:7370952, ack 4249, win 1002, options [nop,nop,TS val 718803067 ecr 1773071292], length 228 04:54:50.699027 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7370512, win 7689, options [nop,nop,TS val 1773071293 ecr 718803066], length 0 04:54:50.699164 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7370952:7371180, ack 4249, win 1002, options [nop,nop,TS val 718803067 ecr 1773071293], length 228 04:54:50.699338 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7370952, win 7689, options [nop,nop,TS val 1773071293 ecr 718803067], length 0 04:54:50.699470 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7371180:7371216, ack 4249, win 1002, options [nop,nop,TS val 718803068 ecr 1773071293], length 36 04:54:50.699699 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7371216:7371612, ack 4249, win 1002, options [nop,nop,TS val 718803068 ecr 1773071293], length 396 04:54:50.699953 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7371216, win 7689, options [nop,nop,TS val 1773071294 ecr 718803067], length 0 04:54:50.700080 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7371612:7371840, ack 4249, win 1002, options [nop,nop,TS val 718803068 ecr 1773071294], length 228 04:54:50.700355 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7371840:7372052, ack 4249, win 1002, options [nop,nop,TS val 718803069 ecr 1773071294], length 212 04:54:50.700579 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7371840, win 7689, options [nop,nop,TS val 1773071294 ecr 718803068], length 0 04:54:50.700704 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7372052:7372456, ack 4249, win 1002, options [nop,nop,TS val 718803069 ecr 1773071294], length 404 04:54:50.700968 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7372456:7372684, ack 4249, win 1002, options [nop,nop,TS val 718803069 ecr 1773071294], length 228 04:54:50.701250 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7372456, win 7689, options [nop,nop,TS val 1773071295 ecr 718803069], length 0 04:54:50.701346 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7372684:7373280, ack 4249, win 1002, options [nop,nop,TS val 718803070 ecr 1773071295], length 596 04:54:50.701570 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7373280:7373508, ack 4249, win 1002, options [nop,nop,TS val 718803070 ecr 1773071295], length 228 04:54:50.701822 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7373508:7373712, ack 4249, win 1002, options [nop,nop,TS val 718803070 ecr 1773071295], length 204 04:54:50.701915 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7373280, win 7689, options [nop,nop,TS val 1773071296 ecr 718803069], length 0 04:54:50.702097 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7373712:7373940, ack 4249, win 1002, options [nop,nop,TS val 718803070 ecr 1773071296], length 228 04:54:50.702268 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7373712, win 7689, options [nop,nop,TS val 1773071296 ecr 718803070], length 0 04:54:50.702323 IP ubuntu-dev-vm-1.cs2cloud.internal.ssh > 10.0.1.10.42550: Flags [P.], seq 7373940:7374152, ack 4249, win 1002, options [nop,nop,TS val 718803071 ecr 1773071296], length 212 04:54:50.702835 IP 10.0.1.10.42550 > ubuntu-dev-vm-1.cs2cloud.internal.ssh: Flags [.], ack 7374152, win 7689, options [nop,nop,TS val 1773071296 ecr 718803070], length 0

I am able to ping the VR from the VM which is on same host. But if I ping from VM to VR which is not on same host then it is not working.

akshaybachhav avatar May 03 '24 05:05 akshaybachhav

I am able to ping the VR from the VM which is on same host. But if I ping from VM to VR which is not on same host then it is not working.

@akshaybachhav can you check the firewall rules on the kvm hosts ?

just to confirm, are both vms configured to get dhcp IPs from the VR ? or as static IPs ?

weizhouapache avatar May 03 '24 06:05 weizhouapache

I am able to ping the VR from the VM which is on same host. But if I ping from VM to VR which is not on same host then it is not working.

@akshaybachhav can you check the firewall rules on the kvm hosts ?

just to confirm, are both vms configured to get dhcp IPs from the VR ? or as static IPs ?

both vms get dhcp ips from vr no static ip set in vms

on both kvm machine status of ufw is Status: inactive

akshaybachhav avatar May 03 '24 06:05 akshaybachhav

Have you disabled apparmour on them, what guide or steps of installation if any did you follow @akshaybachhav Make sure to stop/disable firewalld etc. It's possible you've wrong mtu or bridge or nic/network configuration if you can't ping VMs on the same VLAN.

rohityadavcloud avatar May 03 '24 17:05 rohityadavcloud

Have you disabled apparmour on them, what guide or steps of installation if any did you follow @akshaybachhav Make sure to stop/disable firewalld etc. It's possible you've wrong mtu or bridge or nic/network configuration if you can't ping VMs on the same VLAN.

i followed the tutorial from official documentation from apache cloudstack: image

this is my netplan configuration file. for host one and host 2 image

akshaybachhav avatar May 04 '24 05:05 akshaybachhav

Have you disabled apparmour on them, what guide or steps of installation if any did you follow @akshaybachhav Make sure to stop/disable firewalld etc. It's possible you've wrong mtu or bridge or nic/network configuration if you can't ping VMs on the same VLAN.

i followed the tutorial https://rohityadav.cloud/blog/cloudstack-kvm/ and done exactly same setup.

akshaybachhav avatar May 07 '24 12:05 akshaybachhav

@akshaybachhav can you create a new vm on the host (where the VR is not running on), and check

  • if the new VM gets dhcp ip
  • if the two VMs on the same host can ping each other

weizhouapache avatar May 08 '24 06:05 weizhouapache

@akshaybachhav can you create a new vm on the host (where the VR is not running on), and check

  • if the new VM gets dhcp ip -->yes vm gets ip from dhcp.
  • if the two VMs on the same host can ping each other -->when vms on same host they can ping each other.

we have done setup of cloudstack using mini pc , asus chromebox and acer chromebox which has single nic. image

akshaybachhav avatar May 09 '24 04:05 akshaybachhav

If you are testing communication between 2 VMs on the SAME network, but on different KVM hosts, then you should check:

  • set static IP (any....) from the same subnet (any subnet) on both VMs (to remove DHCP as a variable)
  • ensure that ports on the switch to which the KVM hosts are connected, are in the TRUNK mode and allow passing all required VLANs (including the VLAN used by your isolated network) between all KVM hosts.

The problem you have sounds like (99.999%) to be an underlying infrastructure/configuration issue

andrijapanicsb avatar May 10 '24 11:05 andrijapanicsb

  • vm gets ip from dhcp

It looks we forgot to ask you

  • what type of zone do you use ?
  • do you use vlan ?

weizhouapache avatar May 10 '24 12:05 weizhouapache

@weizhouapache We are using Core Zone Type image

We haven't specified any VLAN/VNI while creating the zone. image

@andrijapanicsb We are only able ssh the vm which is same host as of vr and the other vm which is on different host is not reachable. And I will check about TRUNK mode

dineshjchoudhary avatar May 14 '24 06:05 dineshjchoudhary

Hi @dineshjchoudhary @akshaybachhav

Did you guys managed to resolve this? We have Compute 6 Nodes and our 6th Node is having this same issue.

Using:

  • CS 4.19.0
  • Advanced Network
  • Disaggregated Setup
  • Linstor SDS Storage

We have checked:

  • Verify that App Armor is disabled
  • Verify that IP Tables in KVM Host is disabled
  • Verify that VLAN Trunking is enabled

But in our situation, what we noticed:

  • There are several VPCs created in our cloud
  • Only 1 Network in 1 VPC (A) whose VMs are located in Node 6 have this issue.
  • Strangely, other VMs belonging to other VPCs (B,C, etc) whose VMs are located in Node6 do not have this issue.
  • If we live migrate VMs from VPC(a) from Node 6 to any other hosts, the VM regains connectivity to the router. (Live migration works fine which is strange to me)

btzq avatar May 15 '24 00:05 btzq

Hi All,

We managed to resolve our issue.

Upon further checking, we found out that the VXLAN in Host 6 was down, while other VXLANs were up and running fine.

This would explain why:

  • Only VMs from certain networks were affected (cause cloudstack creates 1 VXLAN for each Network Tier)
  • LIve Migration between Node 6 and other hosts works fine (cause Live Migration uses the Management Physical Network instead of Guest Network)

@dineshjchoudhary @akshaybachhav , maybe you guys should try checking this out too if the root cause is the same. When we manually bring back the VXLAN, it works.

What we dont know now is, why the VXLAN suddenly went down. Ill probably raise another ticket for this.

btzq avatar May 15 '24 04:05 btzq

Hi All,

We managed to resolve our issue.

Upon further checking, we found out that the VXLAN in Host 6 was down, while other VXLANs were up and running fine.

This would explain why:

  • Only VMs from certain networks were affected (cause cloudstack creates 1 VXLAN for each Network Tier)
  • LIve Migration between Node 6 and other hosts works fine (cause Live Migration uses the Management Physical Network instead of Guest Network)

@dineshjchoudhary @akshaybachhav , maybe you guys should try checking this out too if the root cause is the same. When we manually bring back the VXLAN, it works.

What we dont know now is, why the VXLAN suddenly went down. Ill probably raise another ticket for this.

Interesting. When you say VXLAN interface, is it the physical interface (underlay) or the VTEP attached to the bridge?

zap51 avatar May 27 '24 15:05 zap51

Hi All,

We managed to resolve our issue.

Upon further checking, we found out that the VXLAN in Host 6 was down, while other VXLANs were up and running fine.

This would explain why:

  • Only VMs from certain networks were affected (cause cloudstack creates 1 VXLAN for each Network Tier)
  • LIve Migration between Node 6 and other hosts works fine (cause Live Migration uses the Management Physical Network instead of Guest Network)

@dineshjchoudhary @akshaybachhav , maybe you guys should try checking this out too if the root cause is the same. When we manually bring back the VXLAN, it works.

What we dont know now is, why the VXLAN suddenly went down. Ill probably raise another ticket for this.

Do you use multicast group?

If yes, can you check if the setting impacts you? https://docs.cloudstack.apache.org/projects/archived-cloudstack-getting-started/en/latest/networking/vxlan.html#important-note-on-max-number-of-multicast-groups-and-thus-vxlan-intefaces

weizhouapache avatar May 27 '24 16:05 weizhouapache