fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Storage.type Filesystem & Storage.pause_on_chunks_overlimit not pauses input

Open drbugfinder-work opened this issue 1 year ago • 4 comments

Bug Report

Describe the bug

In case of

Storage.type Filesystem
Storage.pause_on_chunks_overlimit On

the input (almost) never gets paused. According to the documentation https://docs.fluentbit.io/manual/administration/backpressure#about-pause-and-resume-callbacks the input should be stopped, when the Storage.pause_on_chunks_overlimit is activated and the Storage.max_chunks_up is reached. The default value for Storage.max_chunks_up is 128. The inputs are indeed stopped, when I set it to a very low value (Storage.max_chunks_up 5 for example). I never saw the number of up-chunks growing beyond ~10. It decreases then again and the number of down chunks rises.

I do not want the down chunks to be overwritten/rotated, so I don't want to set storage.total_limit_size. The input should simply pause (as Storage.pause_on_chunks_overlimit is intended to be used), but this won't work, if the number of up-chunks never grows and only number of down-chunks rises.

https://github.com/fluent/fluent-bit/blob/7233f9ae360385435a8c52ea7830395b17c3fb68/src/flb_input_chunk.c#L1206-L1212

...{
  "storage_layer": {
    "chunks": {
      "total_chunks": 727,
      "mem_chunks": 1,
      "fs_chunks": 726,
      "fs_chunks_up": 5,
      "fs_chunks_down": 721
    }
  },
...
    "ingress": {
      "status": {
        "overlimit": false,
        "mem_size": "7.8K",
        "mem_limit": "0b"
      },
      "chunks": {
        "total": 726,
        "up": 5,
        "down": 721,
        "busy": 725,
        "busy_size": "72.8K"
      }
    }
...

Is there an explanation for that behavior?

While testing I also found another bug regarding config parsing for Storage.pause_on_chunks_overlimit and provided a fix here: https://github.com/fluent/fluent-bit/pull/8720

drbugfinder-work avatar Apr 16 '24 13:04 drbugfinder-work

cc @pwhelan @edsiper @agup006

drbugfinder-work avatar Apr 16 '24 13:04 drbugfinder-work

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Jul 16 '24 01:07 github-actions[bot]

Hi, what exactly are you trying to achieve? If the number of chunks up remains low, it means it can handle the ingestion rate. Why do you think the output should be paused in this situation?

RicharddeJong avatar Jul 20 '24 05:07 RicharddeJong

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Oct 20 '24 02:10 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Oct 26 '24 02:10 github-actions[bot]

Very good question indeed by drbugfinder-work. Interesting why no attention given.

pavlov2000uk avatar Jul 18 '25 09:07 pavlov2000uk