netflow insight aggregator dies and cannot be restarted, reason: DatabaseError: database disk image is malformed
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
- [v ] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
- [v] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue
Describe the bug
no 'breaking changes' were found documented on the opnsense release pages.
insight aggregator stops and cannot be started
This has been going on since quite some time. The fix reported is to 'delete all netflow data' which does not sit well with me.
This looks like a reoccurring bug ? Not few posts on this board signal this error happened in the past.
flowd_aggregate.pyflowd aggregate died with message Traceback (most recent call last): File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 162, in run aggregate_flowd(self.config, do_vacuum) File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 80, in aggregate_flowd stream_agg_object.add(copy.copy(flow_record)) File "/usr/local/opnsense/scripts/netflow/lib/aggregates/source.py", line 69, in add super(FlowSourceAddrTotals, self).add(flow) File "/usr/local/opnsense/scripts/netflow/lib/aggregates/init.py", line 187, in add self._update_cur.execute(self._insert_stmt, flow) sqlite3.DatabaseError: database disk image is malformed
/var/netflow shows no broken files or lock files left behind
work-around with data loss
cd /usr/local/opnsense/scripts/netflow/
./flush_all.sh all
To Reproduce
Steps to reproduce the behavior:
- unknown, assumed to be something broken in the code base
- upgrade ?
Expected behavior
'insight aggregator' does not stop running sqlite3 files under /var/netflow are not reported as "DatabaseError: database disk image is malformed "
Describe alternatives you considered
restarting the flowd_aggregator service from the commandline ( did not work )
Screenshots
n/a
Relevant log files
see the message extract above
Additional context
observed since upgrade to 25.10
Environment
Software version used and hardware type if relevant, e.g.:
OPNsense 25.10 (amd64). KVM virtual machine virtio network-card-driver
Maybe you have an idea how the database could have been corrupted? If an Sqlite file corrupts it's unlikely to recover correctly. This happens especially with UFS and/or power outages / unclean shutdowns.
Cheers, Franco
Maybe you have an idea how the database could have been corrupted? If an Sqlite file corrupts it's unlikely to recover correctly. This happens especially with UFS and/or power outages / unclean shutdowns.
Cheers, Franco
that's not unlike for an unclean shutdown to have happened, the Proxmox hypervisor keeps running into issues after a few weeks, clean shutdowns do not always happen
i do assume detection of a malformed sqlite db should trigger some or other remediation procedure
i did not bother to try and delete the last record added, this could probably fix this issue also ?
in case there's a procedure to do so and for me to test, i'm open to do so, I kept a copy of the data in /var/netflow and /var/log/netflow for such purpose.