Storage.type Filesystem & Storage.pause_on_chunks_overlimit not pauses input
Bug Report
Describe the bug
In case of
Storage.type Filesystem
Storage.pause_on_chunks_overlimit On
the input (almost) never gets paused. According to the documentation https://docs.fluentbit.io/manual/administration/backpressure#about-pause-and-resume-callbacks
the input should be stopped, when the Storage.pause_on_chunks_overlimit is activated and the Storage.max_chunks_up is reached. The default value for Storage.max_chunks_up is 128. The inputs are indeed stopped, when I set it to a very low value (Storage.max_chunks_up 5 for example).
I never saw the number of up-chunks growing beyond ~10. It decreases then again and the number of down chunks rises.
I do not want the down chunks to be overwritten/rotated, so I don't want to set storage.total_limit_size.
The input should simply pause (as Storage.pause_on_chunks_overlimit is intended to be used), but this won't work, if the number of up-chunks never grows and only number of down-chunks rises.
https://github.com/fluent/fluent-bit/blob/7233f9ae360385435a8c52ea7830395b17c3fb68/src/flb_input_chunk.c#L1206-L1212
...{
"storage_layer": {
"chunks": {
"total_chunks": 727,
"mem_chunks": 1,
"fs_chunks": 726,
"fs_chunks_up": 5,
"fs_chunks_down": 721
}
},
...
"ingress": {
"status": {
"overlimit": false,
"mem_size": "7.8K",
"mem_limit": "0b"
},
"chunks": {
"total": 726,
"up": 5,
"down": 721,
"busy": 725,
"busy_size": "72.8K"
}
}
...
Is there an explanation for that behavior?
While testing I also found another bug regarding config parsing for Storage.pause_on_chunks_overlimit and provided a fix here: https://github.com/fluent/fluent-bit/pull/8720
cc @pwhelan @edsiper @agup006
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
Hi, what exactly are you trying to achieve? If the number of chunks up remains low, it means it can handle the ingestion rate. Why do you think the output should be paused in this situation?
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
This issue was closed because it has been stalled for 5 days with no activity.
Very good question indeed by drbugfinder-work. Interesting why no attention given.