kibana
kibana copied to clipboard
[Fleet] Support multiple Fleet Servers in Fleet UI
Related to https://github.com/elastic/fleet-server/issues/903
Multiple Fleet Servers UI
We're aiming to support our Fleet scalability efforts by allowing customers to load balance their agents across multiple Fleet Server instances. Currently, when multiple Fleet Server hosts are configured, Fleet Server will round-robin between the configured hosts. We're looking to change this so users can explicitly assign agents to a specific Fleet Server via an agent policy to better support high-scale environments where many agents may be distributed geographically.
To support this functionality, we'll need to make some changes to how we model Fleet Server hosts, namely:
- Fleet Server hosts will be migrated to a new, object-based structure (Fleet Server configs) rather than an array of strings
- Fleet Server configs will be manageable only via a flyout on the
/settings
page similar to outputs and agent download binaries
From a UX perspective, it's important that we have a "single source of truth" for managing Fleet Server. We want to funnel users to a single UI related to their Fleet Servers to avoid confusion.
Proposed Fleet Server config saved object schema:
name: string;
host_urls: string[];
is_default: boolean;
Note Proxy settings are a separate effort, and will be implemented in a follow-up scope. Please ignore them for now in any designs or documentation.
We'll also need to add a setting for which Fleet Server is used when enrolling agents in a given agent policy. This should be displayed alongside the existing output settings for agent policies, and will appear in any enrollment commands we display to the user.
In addition, we'd like to change the "no fleet server" state on the /agents
landing page in Fleet to direct users to the new /settings
-> "Add a Fleet Server" flyout workflow.
⚒️ Implementation
- [ ] Setup and plumbing
- [ ] Create new saved object type for object-based Fleet Server configs
- [ ] Create a migration that converts existing Fleet Server host settings -> Fleet Server config records
- [ ] Update
config/kibana.yml
config support to account for new Fleet Server host schema - [ ] Create APIs for creating, editing, and deleting Fleet Server config records
- [ ] New Fleet Server UI on
/settings
- [ ] Allow add, edit, and delete actions on Fleet Server config records
- [ ] Prevent deletion of the Fleet Server host marked as
default
- [ ] Replace the UI in Fleet's settings tabs for Fleet Server hosts with a flyout based UI similar to outputs/agent binary downloads (see designs)
- [ ] Rework the "Quick Start" and "Advanced" tabs of the existing Fleet Server instructions to account for the new Fleet Server config object data model
- [ ] Ensure the service token generation process still works with the new Fleet Server structure
- [ ] Add a new (see designs) "delete Fleet Server" confirmation modal when deleting a Fleet Server config
- [ ] Existing Fleet Server UI conversion
- [ ] Replace the "no Fleet Server" landing page experience with a simple text block + button that links to the
/settings
page and opens the new Fleet Server flyout
- [ ] Replace the "no Fleet Server" landing page experience with a simple text block + button that links to the
- [ ] Per-policy settings
- [ ] Add an option in agent policy settings to select a specific Fleet Server config
- [ ] Implement logic to use the configured Fleet Server host for the selected agent policy in the "Add Agent" flyout - e.g. when we display enrollment tokens
🎨 Design references
Show design samples
New settings page
"No Fleet Server" landing page experience
"Add a Fleet Server" first-time experience in new /settings
flyout
Edit existing Fleet Server
Multiple URL UI
New Delete modal
Per-policy Fleet Server selection
❓Open Questions
- Instead of spinning up a new
Fleet Server Config
saved object, would it make more sense to expand our existingingest_manager_settings
SO type to contain an array of fleet server config objects instead of just an array of fleet server host strings? - Will there be new CLI arguments to specify a Fleet Server host by name we need to account for when we display enrollment commands in various flyouts?
- Where/how does Fleet need to edit agent policies to support these changes?
- Currently we provide a
fleet.hosts
array of strings that contains the available Fleet Server hosts. Does the behavior or usage of this field need to change?
- Currently we provide a
- How should we determine the name to use when migrating legacy Fleet Server hosts to the new Fleet Server config schema?
- Maybe a simple
Fleet Server 1
orFleet Server 2
style name would suffice
- Maybe a simple
Pinging @elastic/fleet (Team:Fleet)
Will there be new CLI arguments to specify a Fleet Server host by name we need to account for when we display enrollment commands in various flyouts?
For enrollment you need the host and not the name as the agent will have to communicate with that Fleet server no?
Will there be new CLI arguments to specify a Fleet Server host by name we need to account for when we display enrollment commands in various flyouts?
For enrollment you need the host and not the name as the agent will have to communicate with that Fleet server no?
I think this coupling will be changed as of this issue. A Fleet Server host object (maybe needs a better name e.g Fleet Server config) might contain multiple host URL's. Since we provide the host URL at time of enrollment currently, it seems to me this will need to change on Fleet Server's side.
Is there any dependency on Fleet Server to make use of the new configs? If so, we might want to add a feature flag in case the changes are not ready at the same time.
Is there any dependency on Fleet Server to make use of the new configs? If so, we might want to add a feature flag in case the changes are not ready at the same time.
I am not sure about this. Feature flagging may be a safe bet either way, though.
In terms of how these changes interact with Fleet Server, I'm not sure. I think for the actual process of starting Fleet Server nothing needs to change. Kibana will still generate a service token used for bootstrapping a Fleet Server and I don't think there will be any changes to that process. I could be wrong about this.
For the agent enrollment process, Kibana will generate an enrollment token for the enrollment command we present to the user in the "Add Agent" flow, so that agent should know it's configured Fleet Server host based on this token.
@narph @michel-laterman could you take a look at this work when you get a chance and provide any thoughts you might have on how the changes in configuration for Fleet Server hosts on the UI side might affect Fleet Server?
@kpollich Regarding this? should we support also this in Kibana config file? we currently have a settings xpack.fleet.agents.fleet_server.hosts
that probably need to be migrated
Fleet Server hosts will be manageable only via a flyout on the /settings page similar to outputs and agent download binaries
Yes good call. I will add an implementation note for updating the Kibana configuration options for Fleet Server hosts to support the new schema.
Edit: Added a checklist item under the setup/plumbing section
One thing I'd like feedback on here and in https://github.com/elastic/kibana/issues/140533 is this open question
Instead of spinning up a new Fleet Server Config saved object, would it make more sense to expand our existing ingest_manager_settings SO type to contain an array of fleet server config objects instead of just an array of fleet server host strings?
Does it make more sense to spin up new saved objects types (Fleet Server Config + Proxy) and new CRUD APIs for those saved object types or to add more complex fields to our existing settings saved object?
If we were to expand the ingest_manager_settings
SO we would have something like
{
fleet_server_configs: [
{ name: "Fleet Server 1", host_urls: ["fleet-server.example.com"], is_default: true }
],
proxies: [
{ name: "US Proxy", url: "proxy.example.com", proxy_header: "...", ... }
],
has_seen_add_data_notice: true,
has_seen_fleet_migration_notice: true
}
I wonder if this would simplify the implementation for this + the proxy work by removing the need to set up new APIs, etc. For outputs and binary download sources, though, we have distinct saved object types since there are enough fields to warrant it. I think proxy settings also have enough fields that they could easily be their own SO, but fleet server configs maybe not.
For consistency of approach, it probably makes sense to pursue separate saved objects and APIs for these various resources we're managing via the settings page, but I wonder if it'd be worth rethinking that approach.
Curious for others' thoughts.
I am wondering if it make sense to clean has_seen_add_data_notice
and has_seen_fleet_migration_notice
too, for the first one I did not find where it's used, and for has_seen_fleet_migration_notice
it was used for the migration to Fleet server (in 7.13 I think)
Is there anything in ingest_manager_settings
that we would keep (and not deprecate)?
I would vote for creating a new SO type for fleet server configs, so that it has a more meaningful name (and not the legacy ingest_manager
in it), and eventually the ingest_manager_settings
can be removed, if all of it is deprecated.
Thanks all, the original implementation plan in the description here will remain valid then.
Hi Team
We have created 14 testcases for this feature under Fleet test suite:
- Validate fields under fleet Settings for Fleet Server and Outputs.
- Validate under Add Fleet Server flyout User gets option to Add Name, Url, Add row.
- Validate user is able to edit existing Fleet server URL under the Settings tab.
- Validate on hovering over Add another url message: "Specify multiple URLs to scale out your deployment and provide automatic failover" is visible.
- Validate on editing existing fleet server url user is able to add multiple URLs to the Fleet URLs.
- Validate user is able to select different fleet server under Agent Policy settings.
- Validate if no fleet server or standalone agent is installed Fleet server policy under Policy settings remain blank for Fleet Server.
- Validate user is able to add fleet server host under Fleet Settings from Add fleet server flyout.
- Validate user is able to install agents with any fleet server host available.
- Validate Add fleet server button available under Fleet Settings tab, which open Add fleet server flyout.
- Validate that if agent is installed with fleet server 01 and fleet server 01 is uninstalled, then observe installed agent remains Healthy with fleet server 02 added under same host name.
- [Self-Managed]: Validate on fresh setup user gets Name and URL to add in Quick start mode for generating policy under Add Fleet Server flyout.
- [Self-Managed]: Validate a message when no Fleet Server is installed and [Add Fleet Server] button is available.
- Validate on adding multiple fleet server URLs, the existing agents is not getting Unhealthy.
Please let us know if any other scenario is required to be tested from our end.
Thanks!
Hi Team, We have executed 14 testcases under Support multiple Fleet Servers in Fleet UI.
Status:
PASS: 13
FAIL: 01
Issue for failed case under https://github.com/elastic/kibana/issues/146769
Build details: BUILD: 58852 COMMIT: d3a625ef4a6e611a5b3233a1ce5cbe8ef429eb47
Please let us know if anything else is required from our end. Thanks