agent
agent copied to clipboard
Proposal: Flow release plan
Summary
RFC-0004 describes Grafana Agent Flow, an alternative way of configuring Grafana Agent with the goals of making Grafana Agent easier to understand and debug by introducing components.
The ongoing implementation of Flow functionality a massive departure from how the agent works today, and so we need a plan for how users will run Flow.
Eventually, there should be only one way to run the agent. However, Flow is not guaranteed to be the future of the agent; we need to collect feedback from users. This means that either Flow will replace the existing "static" mode (with tooling for users to migrate) or the experimental Flow functionality will be removed and the static mode will be kept. While removing the static mode may be controversial, I believe it is better to end up with one way to run the agent rather than two.
Terminology
- Static mode: The agent config mechanism as it exists today
- Flow / Flow mode: The Grafana Agent Flow config mechanism as described in RFC-0004
Options for running Flow
Before discussing the release plan, we must discuss how users will be able to run Flow while it is an experimental feature.
Option 1: New Flow-specific binary
A Flow-specific binary grafana-agent-flow can be introduced during the development of Flow while it is experimental. Either the existing binary or the new binary will eventually be removed once there is only one way to run the agent.
Pros:
- Easy for maintainers to implement
- Makes it easier to have a clean break from the existing binary (i.e., don't inherit existing flags)
- Allows for subcommands in the new binary like
grafana-agent-flow lint.
Cons:
- Two binaries may be confusing to users
- Will cause a headache once we have to decide if Flow should become the only way to run the agent.
Option 2: Enable Flow through an environment variable
With this option, the new route could be enabled by setting an environment variable like EXPERIMENTAL_ENABLE_FLOW=1. This could act as an entirely separate entrypoint, behaving similarly to a separate binary.
Pros:
- Same Pros as option 1, as it acts like a separate binary.
- Less confusing to users over option 1: there's still just one binary.
- Easy to allow Flow to be the default mode: just remove the old entrypoint.
Cons:
- Using environment variables to determine agent behavior isn't a pattern we use, and may be strange to users.
Option 3: Enable Flow through subcommands
The agent binary could be updated to support subcommands, and flow mode could be enabled through agent flow, with the current static mode being the default command, so that agent defaults to agent static.
Pros:
- Same Pros as option 2: it acts like a separate binary without needing to confuse users.
- More in-line with how the agent is configured today, where environment variables aren't used.
Cons:
- We may struggle to find a library that allows us to do this while keeping the flag parsing the same as it is today, where both
-config.fileand--config.fileare valid syntaxes. - Similar issue to having multiple binaries: what does the
agent flowcommand change to if it is the only mode?
Option 4: Enable Flow through feature and file type flags
Finally, we could treat Flow the same we way treat other experimental features, enabling it through agent -enable-features=flow -config.file.type=flow.
Pros:
- Most similar option to how the Agent works today
- Is the only option that wouldn't require updating usagestats to understand whether Flow is being used
Cons:
- Way harder to cleanly separate the Flow and static modes.
- Potentially more confusing for users: it will be unclear which command line flags are used for which mode.
Proposal
I'm believe we should go with option 2, enabling Flow through an environment variable.
This option is one of the easiest to implement, while avoiding all future confusion around migration paths. While unusual, using an environment variable to enable Flow mode may be justified given how much of a departure Flow mode is from the static agent mode.
Once Flow is released, we should collect as much feedback from users as possible and iterate on Flow to improve it. We can also abandon the Flow experiment if reception is negative.
If we decide to continue with Flow and we deem it stable, the environment variable should change to AGENT_MODE=flow, where users may also specify AGENT_MODE=static for the existing behavior. At this point, migration tooling should be available to users for migrating from static mode to flow mode. The inverse migration will not be possible, due to the capabilities Flow enables.
After a few months of Flow being stable, we should determine what the singular way to run the Agent should be, deprecating the other mode (either static or Flow). This should follow the normal deprecation schedule, removing the deprecated mode after two releases.
Timeline
Assuming Flow is received well and will replace static mode, the rough timeline (assuming a release every month) looks like this:
v0.27 (2022/08): Release Flow as experimentalv0.31 (2022/12): Graduate Flow to a stable feature with migration toolingv0.33 (2023/02): Make Flow the default mode; deprecate static modev0.35 (2023/04): Remove static mode.
This timeline allows Flow four months for iteration while feedback is being collected. This is just a suggested timeline; the real timeline may be shorter or longer than this.
With the above timeline, v0.35 is an opportunity to instead be v1.0, but when we hit v1.0 is out of scope for this proposal.
Option 2 sounds reasonable. Timeline is a bit more flexible, but lays out a vision of what to expect.
+1 to the proposed option and timeline and option.
This should probably turn into an RFC instead of an issue.
We have shipped Flow ages ago, closing as accepted and completed :)