kibana [Fleet] Support multiple Fleet Servers in Fleet UI

Related to https://github.com/elastic/fleet-server/issues/903

Multiple Fleet Servers UI

We're aiming to support our Fleet scalability efforts by allowing customers to load balance their agents across multiple Fleet Server instances. Currently, when multiple Fleet Server hosts are configured, Fleet Server will round-robin between the configured hosts. We're looking to change this so users can explicitly assign agents to a specific Fleet Server via an agent policy to better support high-scale environments where many agents may be distributed geographically.

To support this functionality, we'll need to make some changes to how we model Fleet Server hosts, namely:

Fleet Server hosts will be migrated to a new, object-based structure (Fleet Server configs) rather than an array of strings
Fleet Server configs will be manageable only via a flyout on the /settings page similar to outputs and agent download binaries

From a UX perspective, it's important that we have a "single source of truth" for managing Fleet Server. We want to funnel users to a single UI related to their Fleet Servers to avoid confusion.

Proposed Fleet Server config saved object schema:

name: string;
host_urls: string[];
is_default: boolean;

Note Proxy settings are a separate effort, and will be implemented in a follow-up scope. Please ignore them for now in any designs or documentation.

We'll also need to add a setting for which Fleet Server is used when enrolling agents in a given agent policy. This should be displayed alongside the existing output settings for agent policies, and will appear in any enrollment commands we display to the user.

In addition, we'd like to change the "no fleet server" state on the /agents landing page in Fleet to direct users to the new /settings -> "Add a Fleet Server" flyout workflow.

⚒️ Implementation

[ ] Setup and plumbing
- [ ] Create new saved object type for object-based Fleet Server configs
- [ ] Create a migration that converts existing Fleet Server host settings -> Fleet Server config records
- [ ] Update config/kibana.yml config support to account for new Fleet Server host schema
- [ ] Create APIs for creating, editing, and deleting Fleet Server config records
[ ] New Fleet Server UI on /settings
- [ ] Allow add, edit, and delete actions on Fleet Server config records
- [ ] Prevent deletion of the Fleet Server host marked as default
- [ ] Replace the UI in Fleet's settings tabs for Fleet Server hosts with a flyout based UI similar to outputs/agent binary downloads (see designs)
- [ ] Rework the "Quick Start" and "Advanced" tabs of the existing Fleet Server instructions to account for the new Fleet Server config object data model
- [ ] Ensure the service token generation process still works with the new Fleet Server structure
- [ ] Add a new (see designs) "delete Fleet Server" confirmation modal when deleting a Fleet Server config
[ ] Existing Fleet Server UI conversion
- [ ] Replace the "no Fleet Server" landing page experience with a simple text block + button that links to the /settings page and opens the new Fleet Server flyout
[ ] Per-policy settings
- [ ] Add an option in agent policy settings to select a specific Fleet Server config
- [ ] Implement logic to use the configured Fleet Server host for the selected agent policy in the "Add Agent" flyout - e.g. when we display enrollment tokens

🎨 Design references

Show design samples

New settings page

"No Fleet Server" landing page experience

"Add a Fleet Server" first-time experience in new `/settings` flyout

Edit existing Fleet Server

Multiple URL UI

New Delete modal

Per-policy Fleet Server selection

❓Open Questions

Instead of spinning up a new Fleet Server Config saved object, would it make more sense to expand our existing ingest_manager_settings SO type to contain an array of fleet server config objects instead of just an array of fleet server host strings?
Will there be new CLI arguments to specify a Fleet Server host by name we need to account for when we display enrollment commands in various flyouts?
Where/how does Fleet need to edit agent policies to support these changes?
- Currently we provide a fleet.hosts array of strings that contains the available Fleet Server hosts. Does the behavior or usage of this field need to change?
How should we determine the name to use when migrating legacy Fleet Server hosts to the new Fleet Server config schema?
- Maybe a simple Fleet Server 1 or Fleet Server 2 style name would suffice

Aug 01 '22 20:08 kpollich

Pinging @elastic/fleet (Team:Fleet)

Aug 01 '22 20:08 elasticmachine

Will there be new CLI arguments to specify a Fleet Server host by name we need to account for when we display enrollment commands in various flyouts?

For enrollment you need the host and not the name as the agent will have to communicate with that Fleet server no?

Aug 01 '22 20:08 nchaulet

Will there be new CLI arguments to specify a Fleet Server host by name we need to account for when we display enrollment commands in various flyouts?

For enrollment you need the host and not the name as the agent will have to communicate with that Fleet server no?

I think this coupling will be changed as of this issue. A Fleet Server host object (maybe needs a better name e.g Fleet Server config) might contain multiple host URL's. Since we provide the host URL at time of enrollment currently, it seems to me this will need to change on Fleet Server's side.

Aug 01 '22 20:08 kpollich

Is there any dependency on Fleet Server to make use of the new configs? If so, we might want to add a feature flag in case the changes are not ready at the same time.

Aug 03 '22 08:08 juliaElastic

Is there any dependency on Fleet Server to make use of the new configs? If so, we might want to add a feature flag in case the changes are not ready at the same time.

I am not sure about this. Feature flagging may be a safe bet either way, though.

In terms of how these changes interact with Fleet Server, I'm not sure. I think for the actual process of starting Fleet Server nothing needs to change. Kibana will still generate a service token used for bootstrapping a Fleet Server and I don't think there will be any changes to that process. I could be wrong about this.

For the agent enrollment process, Kibana will generate an enrollment token for the enrollment command we present to the user in the "Add Agent" flow, so that agent should know it's configured Fleet Server host based on this token.

@narph @michel-laterman could you take a look at this work when you get a chance and provide any thoughts you might have on how the changes in configuration for Fleet Server hosts on the UI side might affect Fleet Server?

Sep 12 '22 14:09 kpollich

@kpollich Regarding this? should we support also this in Kibana config file? we currently have a settings xpack.fleet.agents.fleet_server.hosts that probably need to be migrated

Fleet Server hosts will be manageable only via a flyout on the /settings page similar to outputs and agent download binaries

Sep 12 '22 14:09 nchaulet

Yes good call. I will add an implementation note for updating the Kibana configuration options for Fleet Server hosts to support the new schema.

Edit: Added a checklist item under the setup/plumbing section

Sep 12 '22 14:09 kpollich

One thing I'd like feedback on here and in https://github.com/elastic/kibana/issues/140533 is this open question

Instead of spinning up a new Fleet Server Config saved object, would it make more sense to expand our existing ingest_manager_settings SO type to contain an array of fleet server config objects instead of just an array of fleet server host strings?

Does it make more sense to spin up new saved objects types (Fleet Server Config + Proxy) and new CRUD APIs for those saved object types or to add more complex fields to our existing settings saved object?

If we were to expand the ingest_manager_settings SO we would have something like

{
  fleet_server_configs: [
    { name: "Fleet Server 1", host_urls: ["fleet-server.example.com"], is_default: true }
  ],
  proxies: [
    { name: "US Proxy", url: "proxy.example.com", proxy_header: "...", ... }
  ],
  has_seen_add_data_notice: true,
  has_seen_fleet_migration_notice: true
}

I wonder if this would simplify the implementation for this + the proxy work by removing the need to set up new APIs, etc. For outputs and binary download sources, though, we have distinct saved object types since there are enough fields to warrant it. I think proxy settings also have enough fields that they could easily be their own SO, but fleet server configs maybe not.

For consistency of approach, it probably makes sense to pursue separate saved objects and APIs for these various resources we're managing via the settings page, but I wonder if it'd be worth rethinking that approach.

Curious for others' thoughts.

Sep 13 '22 13:09 kpollich

I am wondering if it make sense to clean has_seen_add_data_notice and has_seen_fleet_migration_notice too, for the first one I did not find where it's used, and for has_seen_fleet_migration_notice it was used for the migration to Fleet server (in 7.13 I think)

Sep 21 '22 19:09 nchaulet

Is there anything in ingest_manager_settings that we would keep (and not deprecate)?

I would vote for creating a new SO type for fleet server configs, so that it has a more meaningful name (and not the legacy ingest_manager in it), and eventually the ingest_manager_settings can be removed, if all of it is deprecated.

Sep 23 '22 10:09 juliaElastic

Thanks all, the original implementation plan in the description here will remain valid then.

Sep 23 '22 13:09 kpollich

Hi Team

We have created 14 testcases for this feature under Fleet test suite:

Please let us know if any other scenario is required to be tested from our end.

Thanks!

Dec 06 '22 10:12 amolnater-qasource

Hi Team, We have executed 14 testcases under Support multiple Fleet Servers in Fleet UI.

Status:

PASS: 13
FAIL: 01

Issue for failed case under https://github.com/elastic/kibana/issues/146769

Build details: BUILD: 58852 COMMIT: d3a625ef4a6e611a5b3233a1ce5cbe8ef429eb47

Please let us know if anything else is required from our end. Thanks

Jan 09 '23 10:01 amolnater-qasource

[Fleet] Support multiple Fleet Servers in Fleet UI

Multiple Fleet Servers UI

⚒️ Implementation

🎨 Design references

New settings page

"No Fleet Server" landing page experience

"Add a Fleet Server" first-time experience in new /settings flyout

Edit existing Fleet Server

New Delete modal

Per-policy Fleet Server selection

❓Open Questions

"Add a Fleet Server" first-time experience in new `/settings` flyout