cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

dnsmasq-dhcp - ignored

Open ccycv opened this issue 2 years ago • 17 comments

ISSUE TYPE
  • Bug Report
COMPONENT NAME
VR - DHCP
CLOUDSTACK VERSION
ACS 4.18.1 with router upgraded to 4.18.1
CONFIGURATION
Shared  network configuration with multiple CIDR in different zones. DHCP service provided by dnsmasq.
OS / ENVIRONMENT
Cloudstack + VMware 
SUMMARY

The DHCP service on the CloudStack guest router is ignoring DHCP requests for specific IP classes post deployment. Manually adding missing classes to the configuration resolves the issue temporarily.

STEPS TO REPRODUCE
- The issue occurs when the system is configured with multiple CIDRs and DHCP options for some classes are missing.
- Comparing the `dhcp-option` entries from a router with a working configuration to the one with issues shows a lack of certain DHCP options for the non-working IP classes.
- After adding the missing classes manually to `cloud.conf`, the issue seems to be resolved, suggesting a bug in DHCP configuration generation or application.

[root@r-6143-VMAMS:/etc/dnsmasq.d#](mailto:root@r-6143-VMAMS:/etc/dnsmasq.d#) cat cloud.conf
listen-address=127.0.0.1,23.29.xxx.130
dhcp-range=set:interface-eth0-0,23.29.xxx.130,static
dhcp-option=tag:interface-eth0-0,15,my.host
dhcp-option=tag:interface-eth0-0,6,23.29.xxx.130,8.8.8.8
dhcp-option=tag:interface-eth0-0,3,23.29.xxx.129
dhcp-option=eth0,26,1500
dhcp-option=tag:interface-eth0-0,1,255.255.255.224
[root@r-6143-VMAMS:/etc/dnsmasq.d#](mailto:root@r-6143-VMAMS:/etc/dnsmasq.d#)


root@r-6143-VMAMS:~# grep -v '^#\|^$' /etc/dnsmasq.conf domain-needed bogus-priv resolv-file=/etc/dnsmasq-resolv.conf
local=/my.host/
except-interface=eth1
except-interface=eth2
except-interface=lo
no-dhcp-interface=eth1
no-dhcp-interface=eth2
expand-hosts
domain=my.host
domain=my.host
domain=my.host
dhcp-range=217.79.xxx.129,static
dhcp-hostsfile=/etc/dhcphosts.txt
dhcp-ignore=tag:!known
dhcp-option=15,"my.host "
dhcp-option=vendor:MSFT,2,1i
dhcp-boot=pxelinux.0
enable-tftp
tftp-root=/opt/tftpboot
dhcp-lease-max=2100
domain=shape.host
log-facility=/var/log/dnsmasq.log
conf-dir=/etc/dnsmasq.d
dhcp-optsfile=/etc/dhcpopts.txt
localise-queries
dhcp-option=option:router,217.79.xxx.129
dhcp-option=6,217.79.xxx.130,8.8.8.8
dhcp-client-update

EXPECTED RESULTS
The expected result is that all the IP classes should be served by the DHCP without needing manual intervention. All IP ranges should have appropriate `dhcp-option` configurations applied automatically.
ACTUAL RESULTS
Some IP classes are being ignored by the DHCP service, leading to instances not receiving an IP upon boot. This issue is observed across different CIDRs in the same zone and requires manual addition of missing `dhcp-option` configurations to resolve.

ccycv avatar Oct 30 '23 13:10 ccycv

Additional info:

Tue 31 Oct 2023 08:32:15 AM UTC Setting up dnsmasq 2023-10-31 08:32:15,789 INFO Wrote edited file /etc/dnsmasq.d/cloud.conf 2023-10-31 08:32:15,789 INFO Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change 2023-10-31 08:32:15,790 INFO Attempting to delete entries from dnsmasq.leases file for VMs which are not on dhcphosts file 2023-10-31 08:32:15,790 ERROR Caught error while trying to delete entries from dnsmasq.leases file: [Errno 2] No such file or directory: '/etc/dhcphosts.txt' 2023-10-31 08:32:15,790 INFO Executing: systemctl restart dnsmasq 2023-10-31 08:32:15,825 INFO Service dnsmasq restart 2023-10-31 08:32:15,828 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change 2023-10-31 08:32:15,828 INFO Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change 2023-10-31 08:32:15,828 INFO Executing: systemctl is-active dnsmasq 2023-10-31 08:32:15,833 INFO Executing: systemctl reload dnsmasq 2023-10-31 08:32:15,840 INFO Service dnsmasq reload 2023-10-31 08:32:19,962 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change 2023-10-31 08:32:19,962 INFO Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change 2023-10-31 08:32:19,963 INFO Executing: systemctl is-active dnsmasq 2023-10-31 08:32:19,966 INFO Executing: systemctl reload dnsmasq 2023-10-31 08:32:19,973 INFO Service dnsmasq reload 2023-10-31 08:32:25,427 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change 2023-10-31 08:32:25,427 INFO Wrote edited file /var/lib/misc/dnsmasq.leases 2023-10-31 08:32:25,427 INFO Attempting to delete entries from dnsmasq.leases file for VMs which are not on dhcphosts file 2023-10-31 08:32:25,428 INFO Deleted 0 entries from dnsmasq.leases file 2023-10-31 08:32:25,428 INFO Executing: systemctl is-active dnsmasq 2023-10-31 08:32:25,432 INFO Executing: systemctl reload dnsmasq 2023-10-31 08:32:25,438 INFO Service dnsmasq reload 2023-10-31 08:32:26,653 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change 2023-10-31 08:32:26,653 INFO Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change 2023-10-31 08:32:26,654 INFO Executing: systemctl is-active dnsmasq 2023-10-31 08:32:26,659 INFO Executing: systemctl reload dnsmasq 2023-10-31 08:32:26,670 INFO Service dnsmasq reload 2023-10-31 08:38:48,777 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change 2023-10-31 08:38:48,777 INFO Wrote edited file /var/lib/misc/dnsmasq.leases 2023-10-31 08:38:48,778 INFO Attempting to delete entries from dnsmasq.leases file for VMs which are not on dhcphosts file 2023-10-31 08:38:48,778 INFO Deleted 0 entries from dnsmasq.leases file 2023-10-31 08:38:48,778 INFO Executing: systemctl is-active dnsmasq 2023-10-31 08:38:48,784 INFO Executing: systemctl reload dnsmasq 2023-10-31 08:38:48,796 INFO Service dnsmasq reload 2023-10-31 08:41:04,420 INFO Nothing to commit. The /etc/dnsmasq.d/cloud.conf file did not change 2023-10-31 08:41:04,420 INFO Nothing to commit. The /var/lib/misc/dnsmasq.leases file did not change 2023-10-31 08:41:04,420 INFO Attempting to delete entries from dnsmasq.leases file for VMs which are not on dhcphosts file 2023-10-31 08:41:04,427 INFO Deleted 1 entries from dnsmasq.leases file 2023-10-31 08:41:04,428 INFO Executing: systemctl is-active dnsmasq 2023-10-31 08:41:04,434 INFO Executing: systemctl reload dnsmasq 2023-10-31 08:41:04,447 INFO Service dnsmasq reload

ccycv avatar Oct 31 '23 08:10 ccycv

I also start having issues with one of two shared networks here, my VR does no longer assing IP addresses. Restarting the network including cleanup (replacing the VR) did not help. Maybe it's the same issue (if not, I will open a new one).

In my case, the DHCP requests reach the VR, but are actively declined, for whatever reason. My logs contain lines like not using configured address xx.yy.zz.aa because it was previously declined).

@ccycv can you check whether the requests really are ignored or rather declined? try running e.g. dhclient eth0 -v on a client, check if there are DHCPDECLINE messages.

Also I can see the DHCP requests on the VR (with tcpdump -i eth0 ether src <client-vm-mac-address<).

MartinEmrich avatar Oct 31 '23 12:10 MartinEmrich

I just updated by mistake a router and now I also have this issues in a different region.

Check the content of /etc/dnsmasq.d/cloud.conf, and compare with your interfaces, inside your router.

Regards, Cristian

On Tue, Oct 31, 2023, 14:46 Martin Emrich @.***> wrote:

I also start having issues with one of two shared networks here, my VR does no longer assing IP addresses. Restarting the network including cleanup (replacing the VR) did not help. Maybe it's the same issue (if not, I will open a new one).

In my case, the DHCP requests reach the VR, but are actively declined, for whatever reason. My logs contain lines like not using configured address xx.yy.zz.aa because it was previously declinet).

@ccycv https://github.com/ccycv can you check whether the requests really are ignored or rather declined? try running e.g. dhclient eth0 -v on a client, check if there are DHCPDECLINE messages.

Also I can see the DHCP requests on the VR (with tcpdump -i eth0 ether src <client-vm-mac-address<).

— Reply to this email directly, view it on GitHub https://github.com/apache/cloudstack/issues/8158#issuecomment-1787150980, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFR53GNEP56PUUUPLP7BLQTYCDXJLAVCNFSM6AAAAAA6WBHA7OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBXGE2TAOJYGA . You are receiving this because you were mentioned.Message ID: @.***>

ccycv avatar Oct 31 '23 12:10 ccycv

My /etc/dnsmasq.d/cloud.conf of the working and non-working router are identical (apart from the network address of course)

MartinEmrich avatar Oct 31 '23 13:10 MartinEmrich

@ccycv @MartinEmrich is this still an issue for you? Do you have any tangible data, like configuration file contents that goes with your issue?

DaanHoogland avatar Jan 12 '24 14:01 DaanHoogland

@DaanHoogland Something was changed in the new version; I have this issue with 3 separate CloudStack environments after the upgrade to 4.18.1. All the routers with a shared network that have more than one Guest IP range are affected. So when I check the /etc/dnsmasq.d/cloud.conf, the config includes only one Guest IP range. For now, I configured manually on each, but after router rebuild/cleanup, I must do it again.

So, what for data do you need?

ccycv avatar Jan 12 '24 16:01 ccycv

Hmm IIRC in the end it was a second VM which had the same IP address for whatever reason. So I guess the VR already assigned the IP to that machine, and thus declined the second request from this machine (different MAC not in the table).

After deleting that rogue VM, the issue did not reappear for me. So most probably not the same as @ccycv's issue at all.

MartinEmrich avatar Jan 12 '24 16:01 MartinEmrich

Hmm IIRC in the end it was a second VM which had the same IP address for whatever reason. So I guess the VR already assigned the IP to that machine, and thus declined the second request from this machine (different MAC not in the table).

After deleting that rogue VM, the issue did not reappear for me. So most probably not the same as @ccycv's issue at all.

ok, please create a new issue when you have a clear picture, @MartinEmrich ?

DaanHoogland avatar Jan 12 '24 18:01 DaanHoogland

@DaanHoogland Something was changed in the new version; I have this issue with 3 separate CloudStack environments after the upgrade to 4.18.1. All the routers with a shared network that have more than one Guest IP range are affected. So when I check the /etc/dnsmasq.d/cloud.conf, the config includes only one Guest IP range. For now, I configured manually on each, but after router rebuild/cleanup, I must do it again.

So, what for data do you need?

a detailed description of what is the environment that causes it. So far what I understand:

  • multiple zones
  • a shared network

the part "multiple CIDR in different zones. " I am not quite sure of. How did you configure this @ccycv ?

DaanHoogland avatar Jan 12 '24 19:01 DaanHoogland

So, on any shared network you can add multiple guest cidr. This is not a new setup, it has already many years, and upgraded from 4.15...

I have shared network with multiple guest ranges/cidr. Only the shared networks are affected.

Regards, Cristian

On Fri, Jan 12, 2024, 21:02 dahn @.***> wrote:

@DaanHoogland https://github.com/DaanHoogland Something was changed in the new version; I have this issue with 3 separate CloudStack environments after the upgrade to 4.18.1. All the routers with a shared network that have more than one Guest IP range are affected. So when I check the /etc/dnsmasq.d/cloud.conf, the config includes only one Guest IP range. For now, I configured manually on each, but after router rebuild/cleanup, I must do it again.

So, what for data do you need?

a detailed description of what is the environment that causes it. So far what I understand:

  • multiple zones
  • a shared network

the part "multiple CIDR in different zones. " I am not quite sure of. How did you configure this @ccycv https://github.com/ccycv ?

— Reply to this email directly, view it on GitHub https://github.com/apache/cloudstack/issues/8158#issuecomment-1889806189, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFR53GNX5FTVKMTBX3SCTMLYOGCD7AVCNFSM6AAAAAA6WBHA7OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBZHAYDMMJYHE . You are receiving this because you were mentioned.Message ID: @.***>

ccycv avatar Jan 12 '24 19:01 ccycv

It is quite easy to reproduce, create shared network, even if the deploy was with basic or advanced networking, and add multiple ranges to shared network, let say 5. Then check cloud.conf from dnsmasq the config/deploy VMs.

I also have a cloudstack with 1 zone, basic, and I have the same issue.

ccycv avatar Jan 12 '24 19:01 ccycv

Ah, I got confused about the multiple zones. Thanks, I'll have a go on reproducing it.

DaanHoogland avatar Jan 12 '24 20:01 DaanHoogland

It is quite easy to reproduce, create shared network, even if the deploy was with basic or advanced networking, and add multiple ranges to shared network, let say 5. Then check cloud.conf from dnsmasq the config/deploy VMs.

I also have a cloudstack with 1 zone, basic, and I have the same issue.

If it worked before, it might be a regression issue. As I remember, it worked when I tested #5530 (see the description of second commit in the PR). There were very few changes with the VR scripts recently, it may be caused by some mgmt server changes, need investigation.

weizhouapache avatar Jan 12 '24 20:01 weizhouapache

Update:

I recently tested a new router and started with just one IP class. When I introduced an additional IP class, I noticed the cloud.conf file in the /etc/dnsmasq.d directory had been updated with the new configuration for the new range. However, it replaced the existing configuration instead of adding to it. It seems the process overwrites the cloud.conf file without retaining the previous settings.

ccycv avatar Feb 06 '24 08:02 ccycv

Update:

I recently tested a new router and started with just one IP class. When I introduced an additional IP class, I noticed the cloud.conf file in the /etc/dnsmasq.d directory had been updated with the new configuration for the new range. However, it replaced the existing configuration instead of adding to it. It seems the process overwrites the cloud.conf file without retaining the previous settings.

@ccycv can you give more details, for example the Ip of vm, and the content of cloud.conf ?

weizhouapache avatar Feb 06 '24 09:02 weizhouapache

@weizhouapache It is easy to replicate. Create a shared type; shared network in a CloudStack with advanced settings. And just add an additional Guest IP range, you will see how the cloud.conf is populated. I didn't had this issue in the previous version.

This was the configuration with the initial Guest IP range:

listen-address=127.0.0.1,181.xx.xxx.178 dhcp-range=set:interface-eth0-0,181.xx.xxx.178,static dhcp-option=tag:interface-eth0-0,15,shape.host dhcp-option=tag:interface-eth0-0,6,181.xx.xxx.178,8.8.8.8,8.8.4.4 dhcp-option=tag:interface-eth0-0,3,181.xx.xxx.177 dhcp-option=eth0,26,1500 dhcp-option=tag:interface-eth0-0,1,255.255.255.240

After I added an additional Guest IP range, it was replaced with the last added one.

listen-address=127.0.0.1,181.xx.xxx.245 dhcp-range=set:interface-eth0-0,181.xx.xxx.245,static dhcp-option=tag:interface-eth0-0,15,shape.host dhcp-option=tag:interface-eth0-0,6,181.xx.xxx.245,8.8.8.8,8.8.4.4 dhcp-option=tag:interface-eth0-0,3,181.xx.xxx.225 dhcp-option=eth0,26,1500 dhcp-option=tag:interface-eth0-0,1,255.255.255.224

Before, where there was no issue, there was a config for each class in this cloud.conf; now, no matter how many Guest IP ranges you add, you will have only one.

ccycv avatar Feb 06 '24 09:02 ccycv

@weizhouapache It is easy to replicate. Create a shared type; shared network in a CloudStack with advanced settings. And just add an additional Guest IP range, you will see how the cloud.conf is populated. I didn't had this issue in the previous version.

This was the configuration with the initial Guest IP range:

listen-address=127.0.0.1,181.xx.xxx.178 dhcp-range=set:interface-eth0-0,181.xx.xxx.178,static dhcp-option=tag:interface-eth0-0,15,shape.host dhcp-option=tag:interface-eth0-0,6,181.xx.xxx.178,8.8.8.8,8.8.4.4 dhcp-option=tag:interface-eth0-0,3,181.xx.xxx.177 dhcp-option=eth0,26,1500 dhcp-option=tag:interface-eth0-0,1,255.255.255.240

After I added an additional Guest IP range, it was replaced with the last added one.

listen-address=127.0.0.1,181.xx.xxx.245 dhcp-range=set:interface-eth0-0,181.xx.xxx.245,static dhcp-option=tag:interface-eth0-0,15,shape.host dhcp-option=tag:interface-eth0-0,6,181.xx.xxx.245,8.8.8.8,8.8.4.4 dhcp-option=tag:interface-eth0-0,3,181.xx.xxx.225 dhcp-option=eth0,26,1500 dhcp-option=tag:interface-eth0-0,1,255.255.255.224

Before, where there was no issue, there was a config for each class in this cloud.conf; now, no matter how many Guest IP ranges you add, you will have only one.

@ccycv thanks for sharing. I was able to reproduce the issue

It worked fine when I tested #5530 (https://github.com/apache/cloudstack/commit/9f5ac89c9a01619bff61bacb94182bae7e0336eb) . maybe some changes after that caused it. cc @DaanHoogland

weizhouapache avatar Feb 06 '24 13:02 weizhouapache

Has this been fixed now @weizhouapache ?

rohityadavcloud avatar Apr 30 '24 11:04 rohityadavcloud

Has this been fixed now @weizhouapache ?

@rohityadavcloud yes, it has been fixed by #8741 closing

weizhouapache avatar Apr 30 '24 11:04 weizhouapache