fluent-bit-docs icon indicating copy to clipboard operation
fluent-bit-docs copied to clipboard

Data Pipeline: documentation doesn't match implementation

Open braydonk opened this issue 2 years ago • 5 comments

I am currently looking through the Data Pipeline section of Fluent Bit's documentation. In particular, I am looking at this graph:

Fluent Bit Data Pipeline graph: input to parser to filter to buffer to routing to outputs

I have two issues with this diagram.

Buffering too late in the pipeline

Buffer's location in the pipeline doesn't seem right to me. My understanding is that Fluent Bit buffers data at input time. Input data is added to chunks stored per-input, which are subsequently flushed to the next section of the pipeline at a scheduled interval. After filtering, the chunks can be re-written due to records being modified or dropped, but after the filtering section it's not that data is newly buffered. The data is in chunks already, and the chunks are flushed and routed to output destinations.

It is confusing for buffering to be a separate section here; at the very least, I think it would make more sense for the buffering to perhaps be attached to the input part of the pipeline.

Parsing as a separate section

Parsing data being at that step of the pipeline does make sense as a best practice, but if this section of the documentation intends to be a direct representation of Fluent Bit's data pipeline, then Parser being some kind of separate step seems inaccurate to me. Some input plugins support attaching a parser as part of ingestion (in_tail for example), but otherwise a parser is applied as a filter, and isn't some kind of separate part of the pipeline. It's best practice for the parsing to occur as early as is reasonable so other filter operations can be effectively applied, but it is not a distinct step of the pipeline directly in the implementation.

The docs written in that section seem to support this being a bit of an inaccuracy. Quoting from the docs:

The Parser allows you to convert from unstructured to structured data.

What is The Parser here? The writing here suggest that The Parser (capitalized as a proper noun) is some distinct unit of the pipeline, and not something that is part of either input or filtering.


The section about parsing could make sense if these docs were not originally intended to represent the actual internal Fluent Bit pipeline and rather some kind of "ideal pipeline setup". If that is the intention of these docs, then I believe that should be qualified more directly in the introduction to the section.

However, whether these docs are intended to represent a best practice setup OR the actual internals of the Fluent Bit pipeline, I believe the buffering section is inaccurate based on my understanding of the actual implementation.

braydonk avatar Sep 25 '23 17:09 braydonk

CC @pwhelan with whom I discussed the concerns before opening an issue.

braydonk avatar Sep 25 '23 17:09 braydonk

Tagging @patrick-stephens and @lecaros per suggestion in Slack

braydonk avatar Sep 27 '23 14:09 braydonk

I think the best option here @braydonk is to suggest updates via a PR. Realistically it is the main way it will actually be done and then it is much more concrete to review.

patrick-stephens avatar Oct 02 '23 10:10 patrick-stephens

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Jan 01 '24 01:01 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Jan 07 '24 01:01 github-actions[bot]

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar May 09 '24 01:05 github-actions[bot]

We discussed this in an old Slack thread and decided it isn't an issue.

  • I did have one misunderstanding about the placement of the Filter step, and we decided it actually does make sense to be there
  • The Parser step being there might not exactly line up with the architecture of Fluent Bit but it does line up logically with what these docs are trying to convey and general pipeline best practices

So I'll close this issue. Thanks!

braydonk avatar May 09 '24 12:05 braydonk