os-bind: journals for primary zones can get corrupted on a reboot
Important notices Before you add a new report, we ask you kindly to acknowledge the following:
- [X] I have read the contributing guide lines at https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md
- [X] I have searched the existing issues, open and closed, and I'm convinced that mine is new.
- [X] The title contains the plugin to which this issue belongs
Describe the bug I've recently migrated my DNS servers into OPNSense using bind, where one firewall is the primary (i.e. the zone owner) and the other has the failover replicated zone. Everything works fine and dandy right up until I have to reboot the primary (i.e. for an update).
Upon reboot, bind won't come up cleanly because the journal files for the primary zones will somehow get corrupted.
To Reproduce
It's not 100% of the time, but it happens often enough that it's a nuisance.
- Enable Bind DNS
- Create a primary zone
- Add a few records to the primary zone
- Reboot
- Bind will start but will complain that the journal for the primary zone is corrupted
Expected behavior Bind comes up cleanly and continues serving DNS records from its configured primary zones.
Reading around, it seems that running rndc sync -clean prior to rndc stop would be sufficient to curb this behavior since it would force Bind to sync out the journal files prior to shutdown, like so:
# from /usr/local/etc/rc.d/named
echo 'Stopping named.'
rndc sync -clean || true # <---- add this line!
if rndc stop; then
wait_for_pids ${rc_pid}
else
Relevant log files
2025-12-04T17:32:23-06:00 Notice named 04-Dec-2025 17:32:23.296 general: notice: running
2025-12-04T17:32:23-06:00 Notice named 04-Dec-2025 17:32:23.295 general: notice: FIPS mode is disabled
2025-12-04T17:32:23-06:00 Notice named 04-Dec-2025 17:32:23.294 general: notice: all zones loaded
2025-12-04T17:32:23-06:00 Informational named 04-Dec-2025 17:32:23.293 zoneload: info: zone 168.192.in-addr.arpa/IN: loaded serial 2511191540
2025-12-04T17:32:23-06:00 Error named 04-Dec-2025 17:32:23.291 zoneload: error: zone rivera.prv/IN: not loaded due to errors.
2025-12-04T17:32:23-06:00 Error named 04-Dec-2025 17:32:23.291 zoneload: error: zone rivera.prv/IN: journal rollforward failed: journal out of sync with zone
2025-12-04T17:32:23-06:00 Informational named 04-Dec-2025 17:32:23.291 zoneload: info: zone 127.in-addr.arpa/IN: loaded serial 42
2025-12-04T17:32:23-06:00 Informational named 04-Dec-2025 17:32:23.289 zoneload: info: zone localhost/IN: loaded serial 42
2025-12-04T17:32:23-06:00 Informational named 04-Dec-2025 17:32:23.286 zoneload: info: zone 0.ip6.arpa/IN: loaded serial 42
2025-12-04T17:32:23-06:00 Informational named 04-Dec-2025 17:32:23.278 zoneload: info: managed-keys-zone: loaded serial 78
2025-12-04T17:32:23-06:00 Notice named 04-Dec-2025 17:32:23.276 general: notice: command channel listening on 127.0.0.1#9530
Later, after manually cleaning the files out (via rm):
2025-12-04T17:39:37-06:00 Notice named 04-Dec-2025 17:39:37.902 general: notice: running
2025-12-04T17:39:37-06:00 Notice named 04-Dec-2025 17:39:37.902 general: notice: FIPS mode is disabled
2025-12-04T17:39:37-06:00 Notice named 04-Dec-2025 17:39:37.901 general: notice: all zones loaded
2025-12-04T17:39:37-06:00 Informational named 04-Dec-2025 17:39:37.901 zoneload: info: zone localhost/IN: loaded serial 42
2025-12-04T17:39:37-06:00 Informational named 04-Dec-2025 17:39:37.901 notify: info: zone rivera.prv/IN: sending notify to 192.168.3.252#53
2025-12-04T17:39:37-06:00 Informational named 04-Dec-2025 17:39:37.900 notify: info: zone 168.192.in-addr.arpa/IN: sending notify to 192.168.3.252#53
2025-12-04T17:39:37-06:00 Informational named 04-Dec-2025 17:39:37.899 zoneload: info: zone 0.ip6.arpa/IN: loaded serial 42
2025-12-04T17:39:37-06:00 Informational named 04-Dec-2025 17:39:37.898 notify: info: zone rivera.prv/IN: sending notifies (serial 2511191541)
2025-12-04T17:39:37-06:00 Informational named 04-Dec-2025 17:39:37.897 notify: info: zone 168.192.in-addr.arpa/IN: sending notifies (serial 2511191540)
2025-12-04T17:39:37-06:00 Informational named 04-Dec-2025 17:39:37.896 zoneload: info: zone rivera.prv/IN: loaded serial 2511191541
2025-12-04T17:39:37-06:00 Informational named 04-Dec-2025 17:39:37.896 zoneload: info: zone 168.192.in-addr.arpa/IN: loaded serial 2511191540
2025-12-04T17:39:37-06:00 Informational named 04-Dec-2025 17:39:37.896 zoneload: info: zone 127.in-addr.arpa/IN: loaded serial 42
2025-12-04T17:39:37-06:00 Informational named 04-Dec-2025 17:39:37.889 zoneload: info: managed-keys-zone: loaded serial 79
2025-12-04T17:39:37-06:00 Notice named 04-Dec-2025 17:39:37.887 general: notice: command channel listening on 127.0.0.1#9530
Additional context
My current solve has been to use rm to clean out the *.jnl files in /usr/local/etc/namedb/primary/ but running rndc sync -clean does the same thing, and I can only surmise more cleanly:
root@firewall-1:~ # cd /usr/local/etc/namedb/primary/
root@firewall-1:/usr/local/etc/namedb/primary # ls -l
total 96
-rw-r--r-- 1 bind bind 1657 Dec 4 17:32 168.192.in-addr.arpa.db
-rw-r--r-- 1 bind bind 262 Dec 4 17:32 bing.db
-rw-r--r-- 1 bind bind 303 Dec 4 17:32 blacklist.db
-rw-r--r-- 1 bind bind 262 Dec 4 17:32 duckduckgo.db
-rw-r--r-- 1 bind bind 148 Dec 3 18:09 empty.db
-rw-r--r-- 1 bind bind 262 Dec 4 17:32 google.db
-rw-r--r-- 1 bind bind 158 Dec 3 18:09 localhost-forward.db
-rw-r--r-- 1 bind bind 226 Dec 3 18:09 localhost-reverse.db
-rw-r----- 1 bind bind 1798 Dec 4 18:19 rivera.prv.db
-rw-r----- 1 bind bind 1175 Dec 4 18:05 rivera.prv.db.jnl
-rw-r--r-- 1 bind bind 307 Dec 4 17:32 whitelist.db
-rw-r--r-- 1 bind bind 262 Dec 4 17:32 youtube.db
root@firewall-1:/usr/local/etc/namedb/primary # rndc sync -clean
root@firewall-1:/usr/local/etc/namedb/primary # ls -l
total 88
-rw-r--r-- 1 bind bind 1657 Dec 4 17:32 168.192.in-addr.arpa.db
-rw-r--r-- 1 bind bind 262 Dec 4 17:32 bing.db
-rw-r--r-- 1 bind bind 303 Dec 4 17:32 blacklist.db
-rw-r--r-- 1 bind bind 262 Dec 4 17:32 duckduckgo.db
-rw-r--r-- 1 bind bind 148 Dec 3 18:09 empty.db
-rw-r--r-- 1 bind bind 262 Dec 4 17:32 google.db
-rw-r--r-- 1 bind bind 158 Dec 3 18:09 localhost-forward.db
-rw-r--r-- 1 bind bind 226 Dec 3 18:09 localhost-reverse.db
-rw-r----- 1 bind bind 1798 Dec 4 18:19 rivera.prv.db
-rw-r--r-- 1 bind bind 307 Dec 4 17:32 whitelist.db
-rw-r--r-- 1 bind bind 262 Dec 4 17:32 youtube.db
I looked for the script file in the src, plugins, and core repositories but couldn't find it. I'm sure I'm not looking in the right place or I'd have proposed a patch already.
Environment OPNsense 25.7.9-amd64 FreeBSD 14.3-RELEASE-p5 OpenSSL 3.0.18
Upon further reading this might be an issue with the upstream package, assuming OPNSense uses their script to bring the service up or down.
If not, then this rndc sync -clear line needs to be added to the shutdown or pre-shutdown script invoked by OPNSense... I'm not familiar enough with the code to figure out what the best way to do this is.
Upon further further reading, it seems the rndc stop call in the script should be sufficient to save and commit the journal files, and for this problem to not happen at all.
The fact that it's happening suggests that service named stop is not being invoked and instead the service PID might be getting killed unceremniously, vs. a clean shutdown via the startup/shutdown script.
Again - I don't know enough about the OPNSense wrappers for the O/S stuff to speak intelligently here about options, so I'll defer to you experts.
Cheers!
I've created a PR (#5102) that seeks to address this issue or, at least, minimize its impact.