dns/rfc2136: delete cache files if nsupdate failed
See #4055.
Similar to #2752, updating my DynDNS Domain via the rfc2136 plugin does not work. I added some log lines to the plugin code and it looks like the update fails at rc.bootup because the internet connection is not established yet. Later on rc.newwanip the plugin reports the IP was not changed and the nsupdate call is skipped.
My workaround is to delete the cache files when the exit code of the nsupdate command is not zero.
@AdSchellevis already mentioned in the issue that here a background process should be used. How can I use a background process here and get the exit code to decide if the cache file should be deleted?
Does mwexec_bg support shell pipes? Then I could do something like this:
$cmd .= " /var/etc/nsupdatecmds{$i}";
$cmd .= " || rm -f ".rfc2136_cache_file($dnsupdate, 4)." ".rfc2136_cache_file($dnsupdate, 6);
I added now a wrapper for handling the cache files in background, but now it is not updating on bootup or newwanip hook. I added in addition to that a cron:
The cron was executed 13 minutes after newwanip and then the update ran through.
OPNSense "General" Log:
2024-06-29T13:15:00 Error opnsense /usr/local/etc/rc.rfc2136: Dynamic DNS ...
2024-06-29T13:02:22 Error opnsense /usr/local/etc/rc.newwanip: Dynamic DNS ...
2024-06-29T13:01:44 Error opnsense /usr/local/etc/rc.bootup: Dynamic DNS ...
Output of nsupdate (I added a 2>&1 >> /var/log/nsupdate.log to the wrapper script):
Sat Jun 29 13:01:44 CEST 2024
; Communication with 94.247.xx.xx#53 failed: operation canceled
could not reach any name server
Sat Jun 29 13:02:22 CEST 2024
couldn't get address for 'ns1.example.com': not found
syntax error
Sat Jun 29 13:15:00 CEST 2024
Result: 0
First I thought it is a DNS resolution issue and I added my DNS server as override to unbound. But it's still not working when the hooks are triggering the nsupdate.
For me it looks like as the hooks are having race conditions.
- On bootup it resolves the IP with unbound but cannot connect to the DNS server
- On newwanip it cannot resolve the DNS server anymore
- ~15 minutes later the cron triggers and all is working fine.
DNS setup on my OPNSense box:
- 8.8.8.8 / 8.8.4.4 as hardcoded dns servers
- Prefer IPv4 over IPv6
- Do not allow overriding DNS server list by PPPoE
What could be the issue here? What could I do here?
At first glance I'm afraid your overcomplicating the code now, when the hook fires multiple times, the command executed maybe different know due the conditionals.
I would really opt for a simpler setup, less code, less chance of regressions. Maybe keeping it as simple as removing the cache files when execution failed is an easier approach (when restructuring the filenames, the script is almost a one-liner).
Trying ddclient as alternative might also be worth the effort.
No, it triggers extactly as before. See #4055. I think the problem was from the beginning, that the logic does not wait for a working internet connection.
The cron is just a workaround for that problem.
I would really opt for a simpler setup, less code, less chance of regressions.
Definitively. But not sure how to archieve that, as the nsupdate should not block the bootup / hook execution.
Trying ddclient as alternative might also be worth the effort.
I will try that, but I wanted to finish here first. Is there a function in the opnsense code where I can check if there is a working internet connection? Should this maybe the case when the newwanip hook is triggered and maybe there is a problem somwhere else?
Definitively. But not sure how to archieve that, as the nsupdate should not block the bootup / hook execution.
I see. Maybe it's better to use a local workaround for your use-case, maybe a late syshook start script (https://docs.opnsense.org/development/backend/autorun.html) removing the cache files and manually restarting the process.
To guide this into a mergable state will likely cost more time than I can offer (maybe even more than engineering a solution myself, in which case adding the functionality to our native backend in the ddclient plugin likely would make more sense).
I will try that, but I wanted to finish here first. Is there a function in the opnsense code where I can check if there is a working internet connection?
Not really, working is a fluid concept, when nsupdate returns an error code, I would use that, if it doesn't you might try to resolve the remote host, but this might complicate the situation even further.
I just want to throw this into the mix as alternative: https://docs.opnsense.org/manual/how-tos/caddy.html#use-dynamic-dns-in-client-mode-only It has this compiled in ready to use: https://github.com/caddy-dns/rfc2136
@Monviech
Interesting, will try that.
@AdSchellevis
What about the potential race condition in the newwanip hook? That hook is used by many components as it looks like. Should a working internet connection be available when the hook is triggered or has every software take care about that itself?
If there is no easy fix, can we at least merge one of the two workarounds I provided for deleting / not creating the cache files?
The plugin would be still dirty, but at least it can work in combination with the cron created in the OPNSense settings. Right now the plugin is broken for me.
What about the potential race condition in the newwanip hook? That hook is used by many components as it looks like. Should a working internet connection be available when the hook is triggered or has every software take care about that itself?
newwanip is merely an event you can subscribe to when a network device receives an address, it doesn't guarantee internet connectivity in any way (https://docs.opnsense.org/development/backend/legacy.html#configure)
If there is no easy fix, can we at least merge one of the two workarounds I provided for deleting / not creating the cache files?
I'm afraid not as it moves the action in the foreground, peoples setups might freeze during boot as unintended side affect.
The plugin would be still dirty, but at least it can work in combination with the cron created in the OPNSense settings. Right now the plugin is broken for me.
I don't mind the plugin being bad code, as we don't support it officially anyway, but when we merge changes which breaks it for others, it will cost time which would be better spend elsewhere. For us this currently just isn't a priority, if we could push it forward with some tips and review work, that's ok, but only within reasonable limits.
While I would like to help I'm not knowledgeable in the OPNsense framework so I'm sorry for the uninformed question.
Wouldn't just creating the cache/cache6 files after a successful update (instead of before) not only fix this bug, but be the correct thing to do in this flow? Just moving file_put_contents() after the nsupdate exec.
Deleting the cache would break the IPs shown in the edit menu as it reads the last updated IP from the cache.
Although we would need a way for the plugin to retry failed updates, and from what I understand this is only called when the WAN interface changes IP. Is that where the complexity comes in?
I'm also running into this: https://github.com/opnsense/plugins/issues/4055#issuecomment-2932854201
Trying ddclient as alternative might also be worth the effort.
FYI it seems to me that ddclient does not currently have support for RFC2136: https://github.com/opnsense/plugins/tree/master/dns/ddclient/src/opnsense/scripts/ddclient/lib/account