plugins icon indicating copy to clipboard operation
plugins copied to clipboard

os-bind: journals for primary zones can get corrupted on a reboot

Open drivera-armedia opened this issue 1 month ago • 2 comments

Important notices Before you add a new report, we ask you kindly to acknowledge the following:

  • [X] I have read the contributing guide lines at https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md
  • [X] I have searched the existing issues, open and closed, and I'm convinced that mine is new.
  • [X] The title contains the plugin to which this issue belongs

Describe the bug I've recently migrated my DNS servers into OPNSense using bind, where one firewall is the primary (i.e. the zone owner) and the other has the failover replicated zone. Everything works fine and dandy right up until I have to reboot the primary (i.e. for an update).

Upon reboot, bind won't come up cleanly because the journal files for the primary zones will somehow get corrupted.

To Reproduce

It's not 100% of the time, but it happens often enough that it's a nuisance.

  1. Enable Bind DNS
  2. Create a primary zone
  3. Add a few records to the primary zone
  4. Reboot
  5. Bind will start but will complain that the journal for the primary zone is corrupted

Expected behavior Bind comes up cleanly and continues serving DNS records from its configured primary zones.

Reading around, it seems that running rndc sync -clean prior to rndc stop would be sufficient to curb this behavior since it would force Bind to sync out the journal files prior to shutdown, like so:

# from /usr/local/etc/rc.d/named
        echo 'Stopping named.'
        rndc sync -clean || true  # <---- add this line!
        if rndc stop; then
                wait_for_pids ${rc_pid}
        else

Relevant log files

2025-12-04T17:32:23-06:00	Notice	named	 04-Dec-2025 17:32:23.296 general: notice: running
2025-12-04T17:32:23-06:00	Notice	named	 04-Dec-2025 17:32:23.295 general: notice: FIPS mode is disabled
2025-12-04T17:32:23-06:00	Notice	named	 04-Dec-2025 17:32:23.294 general: notice: all zones loaded
2025-12-04T17:32:23-06:00	Informational	named	 04-Dec-2025 17:32:23.293 zoneload: info: zone 168.192.in-addr.arpa/IN: loaded serial 2511191540
2025-12-04T17:32:23-06:00	Error	named	 04-Dec-2025 17:32:23.291 zoneload: error: zone rivera.prv/IN: not loaded due to errors.
2025-12-04T17:32:23-06:00	Error	named	 04-Dec-2025 17:32:23.291 zoneload: error: zone rivera.prv/IN: journal rollforward failed: journal out of sync with zone
2025-12-04T17:32:23-06:00	Informational	named	 04-Dec-2025 17:32:23.291 zoneload: info: zone 127.in-addr.arpa/IN: loaded serial 42
2025-12-04T17:32:23-06:00	Informational	named	 04-Dec-2025 17:32:23.289 zoneload: info: zone localhost/IN: loaded serial 42
2025-12-04T17:32:23-06:00	Informational	named	 04-Dec-2025 17:32:23.286 zoneload: info: zone 0.ip6.arpa/IN: loaded serial 42
2025-12-04T17:32:23-06:00	Informational	named	 04-Dec-2025 17:32:23.278 zoneload: info: managed-keys-zone: loaded serial 78
2025-12-04T17:32:23-06:00	Notice	named	 04-Dec-2025 17:32:23.276 general: notice: command channel listening on 127.0.0.1#9530

Later, after manually cleaning the files out (via rm):

2025-12-04T17:39:37-06:00	Notice	named	 04-Dec-2025 17:39:37.902 general: notice: running
2025-12-04T17:39:37-06:00	Notice	named	 04-Dec-2025 17:39:37.902 general: notice: FIPS mode is disabled
2025-12-04T17:39:37-06:00	Notice	named	 04-Dec-2025 17:39:37.901 general: notice: all zones loaded
2025-12-04T17:39:37-06:00	Informational	named	 04-Dec-2025 17:39:37.901 zoneload: info: zone localhost/IN: loaded serial 42
2025-12-04T17:39:37-06:00	Informational	named	 04-Dec-2025 17:39:37.901 notify: info: zone rivera.prv/IN: sending notify to 192.168.3.252#53
2025-12-04T17:39:37-06:00	Informational	named	 04-Dec-2025 17:39:37.900 notify: info: zone 168.192.in-addr.arpa/IN: sending notify to 192.168.3.252#53
2025-12-04T17:39:37-06:00	Informational	named	 04-Dec-2025 17:39:37.899 zoneload: info: zone 0.ip6.arpa/IN: loaded serial 42
2025-12-04T17:39:37-06:00	Informational	named	 04-Dec-2025 17:39:37.898 notify: info: zone rivera.prv/IN: sending notifies (serial 2511191541)
2025-12-04T17:39:37-06:00	Informational	named	 04-Dec-2025 17:39:37.897 notify: info: zone 168.192.in-addr.arpa/IN: sending notifies (serial 2511191540)
2025-12-04T17:39:37-06:00	Informational	named	 04-Dec-2025 17:39:37.896 zoneload: info: zone rivera.prv/IN: loaded serial 2511191541
2025-12-04T17:39:37-06:00	Informational	named	 04-Dec-2025 17:39:37.896 zoneload: info: zone 168.192.in-addr.arpa/IN: loaded serial 2511191540
2025-12-04T17:39:37-06:00	Informational	named	 04-Dec-2025 17:39:37.896 zoneload: info: zone 127.in-addr.arpa/IN: loaded serial 42
2025-12-04T17:39:37-06:00	Informational	named	 04-Dec-2025 17:39:37.889 zoneload: info: managed-keys-zone: loaded serial 79
2025-12-04T17:39:37-06:00	Notice	named	 04-Dec-2025 17:39:37.887 general: notice: command channel listening on 127.0.0.1#9530

Additional context My current solve has been to use rm to clean out the *.jnl files in /usr/local/etc/namedb/primary/ but running rndc sync -clean does the same thing, and I can only surmise more cleanly:

root@firewall-1:~ # cd /usr/local/etc/namedb/primary/
root@firewall-1:/usr/local/etc/namedb/primary # ls -l
total 96
-rw-r--r--  1 bind bind 1657 Dec  4 17:32 168.192.in-addr.arpa.db
-rw-r--r--  1 bind bind  262 Dec  4 17:32 bing.db
-rw-r--r--  1 bind bind  303 Dec  4 17:32 blacklist.db
-rw-r--r--  1 bind bind  262 Dec  4 17:32 duckduckgo.db
-rw-r--r--  1 bind bind  148 Dec  3 18:09 empty.db
-rw-r--r--  1 bind bind  262 Dec  4 17:32 google.db
-rw-r--r--  1 bind bind  158 Dec  3 18:09 localhost-forward.db
-rw-r--r--  1 bind bind  226 Dec  3 18:09 localhost-reverse.db
-rw-r-----  1 bind bind 1798 Dec  4 18:19 rivera.prv.db
-rw-r-----  1 bind bind 1175 Dec  4 18:05 rivera.prv.db.jnl
-rw-r--r--  1 bind bind  307 Dec  4 17:32 whitelist.db
-rw-r--r--  1 bind bind  262 Dec  4 17:32 youtube.db
root@firewall-1:/usr/local/etc/namedb/primary # rndc sync -clean
root@firewall-1:/usr/local/etc/namedb/primary # ls -l
total 88
-rw-r--r--  1 bind bind 1657 Dec  4 17:32 168.192.in-addr.arpa.db
-rw-r--r--  1 bind bind  262 Dec  4 17:32 bing.db
-rw-r--r--  1 bind bind  303 Dec  4 17:32 blacklist.db
-rw-r--r--  1 bind bind  262 Dec  4 17:32 duckduckgo.db
-rw-r--r--  1 bind bind  148 Dec  3 18:09 empty.db
-rw-r--r--  1 bind bind  262 Dec  4 17:32 google.db
-rw-r--r--  1 bind bind  158 Dec  3 18:09 localhost-forward.db
-rw-r--r--  1 bind bind  226 Dec  3 18:09 localhost-reverse.db
-rw-r-----  1 bind bind 1798 Dec  4 18:19 rivera.prv.db
-rw-r--r--  1 bind bind  307 Dec  4 17:32 whitelist.db
-rw-r--r--  1 bind bind  262 Dec  4 17:32 youtube.db

I looked for the script file in the src, plugins, and core repositories but couldn't find it. I'm sure I'm not looking in the right place or I'd have proposed a patch already.

Environment OPNsense 25.7.9-amd64 FreeBSD 14.3-RELEASE-p5 OpenSSL 3.0.18

drivera-armedia avatar Dec 05 '25 00:12 drivera-armedia

Upon further reading this might be an issue with the upstream package, assuming OPNSense uses their script to bring the service up or down.

If not, then this rndc sync -clear line needs to be added to the shutdown or pre-shutdown script invoked by OPNSense... I'm not familiar enough with the code to figure out what the best way to do this is.

drivera-armedia avatar Dec 05 '25 00:12 drivera-armedia

Upon further further reading, it seems the rndc stop call in the script should be sufficient to save and commit the journal files, and for this problem to not happen at all.

The fact that it's happening suggests that service named stop is not being invoked and instead the service PID might be getting killed unceremniously, vs. a clean shutdown via the startup/shutdown script.

Again - I don't know enough about the OPNSense wrappers for the O/S stuff to speak intelligently here about options, so I'll defer to you experts.

Cheers!

drivera-armedia avatar Dec 05 '25 00:12 drivera-armedia

I've created a PR (#5102) that seeks to address this issue or, at least, minimize its impact.

drivera73 avatar Dec 24 '25 17:12 drivera73