fluentd
fluentd copied to clipboard
Provide certain procedures for restore of useless backup chunk files due to unrecoverable errors
Describe the bug
Description:
This is a kind of documentation bugs, and it's can be an unexpected result of the check backup feature due to unrecoverable errors.
First of all, it can make a host be unstable due to filling tmpfs up with chunk backup files moved by unrecoverable errors For example, it's an available scenario. Actually I had an experience of this case.
1. Keep going to fill the tmpfs or the backup directory with chunk backup files.[0]
2. It make host problem due to some out of space or exhaustion of virtual memory in tmpfs.
To mitigate this issue, we need to remove the backup files after restoring the chunk backup files as usually periodically. But the chunk backup files are binary files, so it's difficult to restore as it is. Additionally there is no certain procedure to restore the chunk backup in the fluentd official docs. As a result, the chunk backup files are useless, just running out meaningless resources of a host, it can cause another trouble.
Q. How to restore backup chunk files due to unrecoverable errors ?
Could you please provide certain procedures to restore the chunk backup files ?
[0] Handling Unrecoverable Errors
If these kinds of fatal errors occur, Fluentd will abort the chunk immediately and move it into secondary or the backup directory.
To Reproduce
For example,
If you are using output plugin with cloudwatch_logs(v0.14.2+), and until running out the storage placed of backup directory, you keep generating bigger logs than 256kb which is CloudWatch hard limit of message length size. Then you can see out of space/memory(the backup directory is placed in tmpfs) problem at the host running the fluentd agent.
Expected behavior
For mitigating the above problem without any log lost, we need how to restore the chunk backup files before removing them.
- check if there is enough resource to save the chunk backup files at the backup directory.
- If running out free size, Restore the chunk backup files according to the certain procedures in the fluentd docs before removing them.
- Remove the chunk backup files for reclaim the storage/memory size.
Currently, we need the "2." solution.
Your Environment
- Fluentd version:
1.14.6
- TD Agent version:
N/A
- Operating system:
RHEL8
- Kernel version:
4.18.0-348
Your Configuration
This issue does not depend on a certain fluentd.conf. It depends on [0] specification of the Fleuntd instead of it.
[0] Handling Unrecoverable Errors
If these kinds of fatal errors occur, the Fluentd will abort the chunk immediately and move it into secondary or the backup directory.
Your Error Log
Any kind of Fluent::UnrecoverableError logs are related with this issue.
https://github.com/fluent-plugins-nursery/fluent-plugin-cloudwatch-logs/blob/7287d1ae78b24e3fb74aee8d3830a65ecd89f65d/lib/fluent/plugin/out_cloudwatch_logs.rb#L382
For example, while using cloudwatch_logs as an output plugin, the following error message is shown.
"Log event in #{group_name} is discarded because it is too large: #{event_bytesize} bytes exceeds limit of #{MAX_EVENT_SIZE}"
Additional context
No response