[Fleet] Introduce Async deploy policies
Summary
Related https://github.com/elastic/ingest-dev/issues/3343
Deploying a lot of policies is an expensive operation in Fleet, as we need to fetch a lot of related data to build the .fleet-policies document, that PR propose to move that for bulk operation to an async task
@juliaElastic @kpollich before I move foward with that do you think it's a direction we should follow and do you see any issue with that?
Performance impact
(tested locally on a macbook)
Updating the default output with 1000 agent policies
With the feature flag on ~3s
With the feature flag off ~44s
That is a big change and it seems from local testing this will allow to reach a much higher scale like 5k agent policies
Cons
That approach will make unexpected error during deploys hard to catch as they will only happens in logs and not in the UI anymore.
:robot: GitHub comments
Expand to view the GitHub comments
Just comment with:
/oblt-deploy: Deploy a Kibana instance using the Observability test environments.rundocs-build: Re-trigger the docs validation. (use unformatted text in the comment!)
I think doing the deploy policies async makes a lot of sense, anyway fleet-server picks up policies async.
As you stated we should make it easy to find out from the logs/state when the deploy was not successful, and retry (which the task seems to do already).
Maybe we could store the latest deployed revision on the SO or query it to display on the UI when .fleet-policies has an outdated or out of sync latest revision. The UI already shows when agents use an outdated version of an agent policy.
/ci
/ci
Pinging @elastic/fleet (Team:Fleet)
@elasticmachine merge upstream
@elasticmachine merge upstream
LGTM, we should run some scale tests to see if multiple policy changes are affected in any way.
I will run some but it should not affect the scale tests as deploying a single policy is still a sync operation, and I think it's what we test in the multiple policy change scenario
LGTM, we should run some scale tests to see if multiple policy changes are affected in any way.
I will run some but it should not affect the scale tests as deploying a single policy is still a sync operation, and I think it's what we test in the multiple policy change scenario
Is there a condition to only use the task for multiple policies? I'm not seeing that in the code, only the feature flag condition.
Is there a condition to only use the task for multiple policies? I'm not seeing that in the code, only the feature flag condition.
The deployPolicies method has not been changed, the part I changed are in the methods that trigger bulk deploy for example _bumpPolicies
@elasticmachine merge upstream
@elasticmachine merge upstream
@elasticmachine merge upstream
@elasticmachine merge upstream
:green_heart: Build Succeeded
- Buildkite Build
- Commit: a69c4d97bed823fe5354940231a9dcbb2dbe336f
- Kibana Serverless Image:
docker.elastic.co/kibana-ci/kibana-serverless:pr-191839-a69c4d97bed8
Metrics [docs]
Async chunks
Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app
| id | before | after | diff |
|---|---|---|---|
fleet |
1.8MB | 1.8MB | +16.0B |
Page load bundle
Size of the bundles that are downloaded on every page load. Target size is below 100kb
| id | before | after | diff |
|---|---|---|---|
fleet |
169.7KB | 169.8KB | +23.0B |
History
- :yellow_heart: Build #232308 was flaky ace46dcfe00050fc2725551b3978b9bd13fdab5d
- :yellow_heart: Build #232217 was flaky a8a2d380e96b948ae4484944b98dd7023398d51d
- :green_heart: Build #232041 succeeded a30513578ae2ce67dcc1eb85a9516a552b68a809
- :green_heart: Build #231956 succeeded 99035210d800f2204f1dead4082cf67b252bf955
- :green_heart: Build #231934 succeeded 45291b03de72ceb3b33662ea5170ea4e97d91c20
To update your PR or re-run it, just comment with:
@elasticmachine merge upstream