fluent-bit-docs
fluent-bit-docs copied to clipboard
Data Pipeline: documentation doesn't match implementation
I am currently looking through the Data Pipeline section of Fluent Bit's documentation. In particular, I am looking at this graph:
I have two issues with this diagram.
Buffering too late in the pipeline
Buffer's location in the pipeline doesn't seem right to me. My understanding is that Fluent Bit buffers data at input time. Input data is added to chunks stored per-input, which are subsequently flushed to the next section of the pipeline at a scheduled interval. After filtering, the chunks can be re-written due to records being modified or dropped, but after the filtering section it's not that data is newly buffered. The data is in chunks already, and the chunks are flushed and routed to output destinations.
It is confusing for buffering to be a separate section here; at the very least, I think it would make more sense for the buffering to perhaps be attached to the input part of the pipeline.
Parsing as a separate section
Parsing data being at that step of the pipeline does make sense as a best practice, but if this section of the documentation intends to be a direct representation of Fluent Bit's data pipeline, then Parser being some kind of separate step seems inaccurate to me. Some input plugins support attaching a parser as part of ingestion (in_tail for example), but otherwise a parser is applied as a filter, and isn't some kind of separate part of the pipeline. It's best practice for the parsing to occur as early as is reasonable so other filter operations can be effectively applied, but it is not a distinct step of the pipeline directly in the implementation.
The docs written in that section seem to support this being a bit of an inaccuracy. Quoting from the docs:
The Parser allows you to convert from unstructured to structured data.
What is The Parser here? The writing here suggest that The Parser (capitalized as a proper noun) is some distinct unit of the pipeline, and not something that is part of either input or filtering.
The section about parsing could make sense if these docs were not originally intended to represent the actual internal Fluent Bit pipeline and rather some kind of "ideal pipeline setup". If that is the intention of these docs, then I believe that should be qualified more directly in the introduction to the section.
However, whether these docs are intended to represent a best practice setup OR the actual internals of the Fluent Bit pipeline, I believe the buffering section is inaccurate based on my understanding of the actual implementation.
CC @pwhelan with whom I discussed the concerns before opening an issue.
Tagging @patrick-stephens and @lecaros per suggestion in Slack
I think the best option here @braydonk is to suggest updates via a PR. Realistically it is the main way it will actually be done and then it is much more concrete to review.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
This issue was closed because it has been stalled for 5 days with no activity.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
We discussed this in an old Slack thread and decided it isn't an issue.
- I did have one misunderstanding about the placement of the
Filterstep, and we decided it actually does make sense to be there - The
Parserstep being there might not exactly line up with the architecture of Fluent Bit but it does line up logically with what these docs are trying to convey and general pipeline best practices
So I'll close this issue. Thanks!