plugins icon indicating copy to clipboard operation
plugins copied to clipboard

dns/rfc2136: delete cache files if nsupdate failed

Open perryflynn opened this issue 1 year ago • 10 comments

See #4055.

Similar to #2752, updating my DynDNS Domain via the rfc2136 plugin does not work. I added some log lines to the plugin code and it looks like the update fails at rc.bootup because the internet connection is not established yet. Later on rc.newwanip the plugin reports the IP was not changed and the nsupdate call is skipped.

My workaround is to delete the cache files when the exit code of the nsupdate command is not zero.

perryflynn avatar Jun 24 '24 19:06 perryflynn

@AdSchellevis already mentioned in the issue that here a background process should be used. How can I use a background process here and get the exit code to decide if the cache file should be deleted?

Does mwexec_bg support shell pipes? Then I could do something like this:

$cmd .= " /var/etc/nsupdatecmds{$i}";
$cmd .= " || rm -f ".rfc2136_cache_file($dnsupdate, 4)." ".rfc2136_cache_file($dnsupdate, 6);

perryflynn avatar Jun 24 '24 19:06 perryflynn

I added now a wrapper for handling the cache files in background, but now it is not updating on bootup or newwanip hook. I added in addition to that a cron:

image

The cron was executed 13 minutes after newwanip and then the update ran through.

OPNSense "General" Log:

2024-06-29T13:15:00	Error	opnsense	/usr/local/etc/rc.rfc2136: Dynamic DNS ...
2024-06-29T13:02:22	Error	opnsense	/usr/local/etc/rc.newwanip: Dynamic DNS ...
2024-06-29T13:01:44	Error	opnsense	/usr/local/etc/rc.bootup: Dynamic DNS ...

Output of nsupdate (I added a 2>&1 >> /var/log/nsupdate.log to the wrapper script):

Sat Jun 29 13:01:44 CEST 2024
; Communication with 94.247.xx.xx#53 failed: operation canceled
could not reach any name server
Sat Jun 29 13:02:22 CEST 2024
couldn't get address for 'ns1.example.com': not found
syntax error
Sat Jun 29 13:15:00 CEST 2024
Result: 0

First I thought it is a DNS resolution issue and I added my DNS server as override to unbound. But it's still not working when the hooks are triggering the nsupdate.

For me it looks like as the hooks are having race conditions.

  • On bootup it resolves the IP with unbound but cannot connect to the DNS server
  • On newwanip it cannot resolve the DNS server anymore
  • ~15 minutes later the cron triggers and all is working fine.

DNS setup on my OPNSense box:

  • 8.8.8.8 / 8.8.4.4 as hardcoded dns servers
  • Prefer IPv4 over IPv6
  • Do not allow overriding DNS server list by PPPoE

What could be the issue here? What could I do here?

perryflynn avatar Jun 29 '24 11:06 perryflynn

At first glance I'm afraid your overcomplicating the code now, when the hook fires multiple times, the command executed maybe different know due the conditionals.

I would really opt for a simpler setup, less code, less chance of regressions. Maybe keeping it as simple as removing the cache files when execution failed is an easier approach (when restructuring the filenames, the script is almost a one-liner). Trying ddclient as alternative might also be worth the effort.

AdSchellevis avatar Jun 29 '24 16:06 AdSchellevis

No, it triggers extactly as before. See #4055. I think the problem was from the beginning, that the logic does not wait for a working internet connection.

The cron is just a workaround for that problem.

I would really opt for a simpler setup, less code, less chance of regressions.

Definitively. But not sure how to archieve that, as the nsupdate should not block the bootup / hook execution.

Trying ddclient as alternative might also be worth the effort.

I will try that, but I wanted to finish here first. Is there a function in the opnsense code where I can check if there is a working internet connection? Should this maybe the case when the newwanip hook is triggered and maybe there is a problem somwhere else?

perryflynn avatar Jun 29 '24 21:06 perryflynn

Definitively. But not sure how to archieve that, as the nsupdate should not block the bootup / hook execution.

I see. Maybe it's better to use a local workaround for your use-case, maybe a late syshook start script (https://docs.opnsense.org/development/backend/autorun.html) removing the cache files and manually restarting the process.

To guide this into a mergable state will likely cost more time than I can offer (maybe even more than engineering a solution myself, in which case adding the functionality to our native backend in the ddclient plugin likely would make more sense).

I will try that, but I wanted to finish here first. Is there a function in the opnsense code where I can check if there is a working internet connection?

Not really, working is a fluid concept, when nsupdate returns an error code, I would use that, if it doesn't you might try to resolve the remote host, but this might complicate the situation even further.

AdSchellevis avatar Jun 30 '24 08:06 AdSchellevis

I just want to throw this into the mix as alternative: https://docs.opnsense.org/manual/how-tos/caddy.html#use-dynamic-dns-in-client-mode-only It has this compiled in ready to use: https://github.com/caddy-dns/rfc2136

Monviech avatar Jun 30 '24 11:06 Monviech

@Monviech

Interesting, will try that.

@AdSchellevis

What about the potential race condition in the newwanip hook? That hook is used by many components as it looks like. Should a working internet connection be available when the hook is triggered or has every software take care about that itself?

If there is no easy fix, can we at least merge one of the two workarounds I provided for deleting / not creating the cache files?

The plugin would be still dirty, but at least it can work in combination with the cron created in the OPNSense settings. Right now the plugin is broken for me.

perryflynn avatar Jul 01 '24 21:07 perryflynn

What about the potential race condition in the newwanip hook? That hook is used by many components as it looks like. Should a working internet connection be available when the hook is triggered or has every software take care about that itself?

newwanip is merely an event you can subscribe to when a network device receives an address, it doesn't guarantee internet connectivity in any way (https://docs.opnsense.org/development/backend/legacy.html#configure)

If there is no easy fix, can we at least merge one of the two workarounds I provided for deleting / not creating the cache files?

I'm afraid not as it moves the action in the foreground, peoples setups might freeze during boot as unintended side affect.

The plugin would be still dirty, but at least it can work in combination with the cron created in the OPNSense settings. Right now the plugin is broken for me.

I don't mind the plugin being bad code, as we don't support it officially anyway, but when we merge changes which breaks it for others, it will cost time which would be better spend elsewhere. For us this currently just isn't a priority, if we could push it forward with some tips and review work, that's ok, but only within reasonable limits.

AdSchellevis avatar Jul 02 '24 07:07 AdSchellevis

While I would like to help I'm not knowledgeable in the OPNsense framework so I'm sorry for the uninformed question.

Wouldn't just creating the cache/cache6 files after a successful update (instead of before) not only fix this bug, but be the correct thing to do in this flow? Just moving file_put_contents() after the nsupdate exec.

Deleting the cache would break the IPs shown in the edit menu as it reads the last updated IP from the cache.

Although we would need a way for the plugin to retry failed updates, and from what I understand this is only called when the WAN interface changes IP. Is that where the complexity comes in?

z411 avatar May 16 '25 20:05 z411

I'm also running into this: https://github.com/opnsense/plugins/issues/4055#issuecomment-2932854201

Trying ddclient as alternative might also be worth the effort.

FYI it seems to me that ddclient does not currently have support for RFC2136: https://github.com/opnsense/plugins/tree/master/dns/ddclient/src/opnsense/scripts/ddclient/lib/account

cpatulea avatar Jun 02 '25 23:06 cpatulea