suricata icon indicating copy to clipboard operation
suricata copied to clipboard

util-logopenfile: Add log rotation thread to rotate during zero-traffic periods

Open Abhijeet94 opened this issue 5 months ago • 14 comments

Description

Fixes log rotation reliability by ensuring rotation checks occur even during periods with no log traffic. Previously, log files would only rotate when writes occurred, causing unpredictable rotation timing during zero-traffic periods.

This implementation consolidates log maintenance operations (rotation and flush) into a single unified thread, improving efficiency and maintainability.

Main Changes

Unified maintenance thread: Consolidates rotation and flush operations into a single log-maintenance thread that:

  • Checks rotation every 1 second by calling Write("", 0) on registered contexts
  • Performs flush operations at the configured heartbeat.output-flush-interval (if enabled)
  • Wakes every 500ms for responsive shutdown handling

Registration system: Added RegisterContextForLogRotation() to manage contexts requiring rotation checks, with a linked list tracked via log_rotation_next pointers

Skip I/O for empty buffers: Modified write functions to perform I/O only when buffer_len > 0, allowing rotation checks without actual writes

Fixed double rotation bug: Ensured rotate_time is updated for both rotation_flag and interval-based rotations

Architecture

The solution reuses existing rotation mechanisms without duplicating logic, ensuring compatibility with all current rotation features (SIGHUP, rotate-interval, filename patterns). The heartbeat thread to flush is re-used to trigger log rotation as well.

Ticket

https://redmine.openinfosecfoundation.org/issues/8115

Make sure these boxes are checked accordingly before submitting your Pull Request -- thank you.

Contribution style:

  • [X] I have read the contributing guide lines at https://docs.suricata.io/en/latest/devguide/contributing/contribution-process.html

Our Contribution agreements:

  • [X] I have signed the Open Information Security Foundation contribution agreement at https://suricata.io/about/contribution-agreement/ (note: this is only required once)

Changes (if applicable):

  • [ ] I have updated the User Guide (in doc/userguide/) to reflect the changes made
  • [ ] I have updated the JSON schema (in etc/schema.json) to reflect all logging changes (including schema descriptions)
  • [X] I have created a ticket at https://redmine.openinfosecfoundation.org/projects/suricata/issues

Abhijeet94 avatar Nov 18 '25 00:11 Abhijeet94

NOTE: This PR may contain new authors.

github-actions[bot] avatar Nov 18 '25 03:11 github-actions[bot]

Thanks for the conribution, could you check the CI failure ? (SV test about a big dataset)

catenacyber avatar Nov 18 '25 14:11 catenacyber

NOTE: This PR may contain new authors.

github-actions[bot] avatar Nov 18 '25 18:11 github-actions[bot]

Hi @Abhijeet94 , there seems to be quite a bit of overlap with this work that was merged in 8.0: #12527. Can that be used/extended address the issue?

victorjulien avatar Nov 18 '25 19:11 victorjulien

NOTE: This PR may contain new authors.

github-actions[bot] avatar Nov 18 '25 19:11 github-actions[bot]

Hi @Abhijeet94 , there seems to be quite a bit of overlap with this work that was merged in 8.0: #12527. Can that be used/extended address the issue?

Hi @victorjulien , it seems like the other PR is more about flushing of in memory data to disk rather than dealing with file close/log rotation. These two things seem to be independent since former can be required without any log rotation and log rotation may need to be triggered even if there is nothing new to flush.

I checked if there is something that can be re-used. The other PR introduces a function WorkerFlushLogs in log-flush.c that is called periodically. My understanding is that log rotation should be different from log-flush, plus it needs objects/processes defined in util-logopenfile.c, hence can't merge the two functionalities in that same worker thread.

Let me know if you think otherwise, thank you!

Abhijeet94 avatar Nov 18 '25 20:11 Abhijeet94

Codecov Report

:x: Patch coverage is 85.00000% with 9 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 84.21%. Comparing base (626027a) to head (823c31c).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #14343   +/-   ##
=======================================
  Coverage   84.20%   84.21%           
=======================================
  Files        1012     1012           
  Lines      261769   261804   +35     
=======================================
+ Hits       220415   220470   +55     
+ Misses      41354    41334   -20     
Flag Coverage Δ
fuzzcorpus 63.29% <25.00%> (-0.01%) :arrow_down:
livemode 18.77% <85.00%> (+0.04%) :arrow_up:
pcap 44.65% <85.00%> (-0.01%) :arrow_down:
suricata-verify 65.00% <86.44%> (+0.03%) :arrow_up:
unittests 59.22% <10.16%> (-0.02%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 18 '25 20:11 codecov[bot]

@jlucovsky any thoughts on this? In my mind both efforts revolve around doing house keeping task at some interval. I'm wondering if we can just expand the flush logic to issue the "flush packets" to do more than only flush, but also trigger rotation and possibly other things in the future.

victorjulien avatar Nov 19 '25 11:11 victorjulien

Let me make an attempt to re-use the housekeeping thread now that I know that I can refactor to increase its scope considerably.

Abhijeet94 avatar Nov 19 '25 19:11 Abhijeet94

Do you know why the approval workflows are not running? Previously they used to start within a few hours. After my recent commit it hasn't started in a day

Abhijeet94 avatar Nov 20 '25 21:11 Abhijeet94

Do you know why the approval workflows are not running? Previously they used to start within a few hours. After my recent commit it hasn't started in a day

We're all at suricon so paying less attention to github this week. I'll have a look now.

victorjulien avatar Nov 20 '25 21:11 victorjulien

NOTE: This PR may contain new authors.

github-actions[bot] avatar Nov 20 '25 21:11 github-actions[bot]

"Fixed double rotation bug: Ensured rotate_time is updated for both rotation_flag and interval-based rotations"

This should go into it's own commit.

victorjulien avatar Nov 24 '25 09:11 victorjulien

I've set the PR to draft as the git history is messy and in general we need a new PR when feedback is implemented.

victorjulien avatar Nov 24 '25 09:11 victorjulien