util-logopenfile: Add log rotation thread to rotate during zero-traffic periods
Description
Fixes log rotation reliability by ensuring rotation checks occur even during periods with no log traffic. Previously, log files would only rotate when writes occurred, causing unpredictable rotation timing during zero-traffic periods.
This implementation consolidates log maintenance operations (rotation and flush) into a single unified thread, improving efficiency and maintainability.
Main Changes
Unified maintenance thread: Consolidates rotation and flush operations into a single log-maintenance thread that:
- Checks rotation every 1 second by calling
Write("", 0)on registered contexts - Performs flush operations at the configured
heartbeat.output-flush-interval(if enabled) - Wakes every 500ms for responsive shutdown handling
Registration system: Added RegisterContextForLogRotation() to manage contexts requiring rotation checks, with a linked list tracked via log_rotation_next pointers
Skip I/O for empty buffers: Modified write functions to perform I/O only when buffer_len > 0, allowing rotation checks without actual writes
Fixed double rotation bug: Ensured rotate_time is updated for both rotation_flag and interval-based rotations
Architecture
The solution reuses existing rotation mechanisms without duplicating logic, ensuring compatibility with all current rotation features (SIGHUP, rotate-interval, filename patterns). The heartbeat thread to flush is re-used to trigger log rotation as well.
Ticket
https://redmine.openinfosecfoundation.org/issues/8115
Make sure these boxes are checked accordingly before submitting your Pull Request -- thank you.
Contribution style:
- [X] I have read the contributing guide lines at https://docs.suricata.io/en/latest/devguide/contributing/contribution-process.html
Our Contribution agreements:
- [X] I have signed the Open Information Security Foundation contribution agreement at https://suricata.io/about/contribution-agreement/ (note: this is only required once)
Changes (if applicable):
- [ ] I have updated the User Guide (in doc/userguide/) to reflect the changes made
- [ ] I have updated the JSON schema (in etc/schema.json) to reflect all logging changes (including schema descriptions)
- [X] I have created a ticket at https://redmine.openinfosecfoundation.org/projects/suricata/issues
NOTE: This PR may contain new authors.
Thanks for the conribution, could you check the CI failure ? (SV test about a big dataset)
NOTE: This PR may contain new authors.
Hi @Abhijeet94 , there seems to be quite a bit of overlap with this work that was merged in 8.0: #12527. Can that be used/extended address the issue?
NOTE: This PR may contain new authors.
Hi @Abhijeet94 , there seems to be quite a bit of overlap with this work that was merged in 8.0: #12527. Can that be used/extended address the issue?
Hi @victorjulien , it seems like the other PR is more about flushing of in memory data to disk rather than dealing with file close/log rotation. These two things seem to be independent since former can be required without any log rotation and log rotation may need to be triggered even if there is nothing new to flush.
I checked if there is something that can be re-used. The other PR introduces a function WorkerFlushLogs in log-flush.c that is called periodically. My understanding is that log rotation should be different from log-flush, plus it needs objects/processes defined in util-logopenfile.c, hence can't merge the two functionalities in that same worker thread.
Let me know if you think otherwise, thank you!
Codecov Report
:x: Patch coverage is 85.00000% with 9 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 84.21%. Comparing base (626027a) to head (823c31c).
Additional details and impacted files
@@ Coverage Diff @@
## main #14343 +/- ##
=======================================
Coverage 84.20% 84.21%
=======================================
Files 1012 1012
Lines 261769 261804 +35
=======================================
+ Hits 220415 220470 +55
+ Misses 41354 41334 -20
| Flag | Coverage Δ | |
|---|---|---|
| fuzzcorpus | 63.29% <25.00%> (-0.01%) |
:arrow_down: |
| livemode | 18.77% <85.00%> (+0.04%) |
:arrow_up: |
| pcap | 44.65% <85.00%> (-0.01%) |
:arrow_down: |
| suricata-verify | 65.00% <86.44%> (+0.03%) |
:arrow_up: |
| unittests | 59.22% <10.16%> (-0.02%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@jlucovsky any thoughts on this? In my mind both efforts revolve around doing house keeping task at some interval. I'm wondering if we can just expand the flush logic to issue the "flush packets" to do more than only flush, but also trigger rotation and possibly other things in the future.
Let me make an attempt to re-use the housekeeping thread now that I know that I can refactor to increase its scope considerably.
Do you know why the approval workflows are not running? Previously they used to start within a few hours. After my recent commit it hasn't started in a day
Do you know why the approval workflows are not running? Previously they used to start within a few hours. After my recent commit it hasn't started in a day
We're all at suricon so paying less attention to github this week. I'll have a look now.
NOTE: This PR may contain new authors.
"Fixed double rotation bug: Ensured rotate_time is updated for both rotation_flag and interval-based rotations"
This should go into it's own commit.
I've set the PR to draft as the git history is messy and in general we need a new PR when feedback is implemented.