kibana icon indicating copy to clipboard operation
kibana copied to clipboard

[Fleet] Introduce Async deploy policies

Open nchaulet opened this issue 1 year ago • 1 comments

Summary

Related https://github.com/elastic/ingest-dev/issues/3343

Deploying a lot of policies is an expensive operation in Fleet, as we need to fetch a lot of related data to build the .fleet-policies document, that PR propose to move that for bulk operation to an async task

@juliaElastic @kpollich before I move foward with that do you think it's a direction we should follow and do you see any issue with that?

Performance impact

(tested locally on a macbook)

Updating the default output with 1000 agent policies

With the feature flag on ~3s Screenshot 2024-08-30 at 9 57 02 AM

With the feature flag off ~44s Screenshot 2024-08-30 at 9 58 31 AM

That is a big change and it seems from local testing this will allow to reach a much higher scale like 5k agent policies Screenshot 2024-08-30 at 2 05 40 PM

Cons

That approach will make unexpected error during deploys hard to catch as they will only happens in logs and not in the UI anymore.

nchaulet avatar Aug 30 '24 14:08 nchaulet

:robot: GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

obltmachine avatar Aug 30 '24 14:08 obltmachine

I think doing the deploy policies async makes a lot of sense, anyway fleet-server picks up policies async. As you stated we should make it easy to find out from the logs/state when the deploy was not successful, and retry (which the task seems to do already). Maybe we could store the latest deployed revision on the SO or query it to display on the UI when .fleet-policies has an outdated or out of sync latest revision. The UI already shows when agents use an outdated version of an agent policy.

juliaElastic avatar Sep 02 '24 08:09 juliaElastic

/ci

nchaulet avatar Sep 03 '24 15:09 nchaulet

/ci

nchaulet avatar Sep 03 '24 19:09 nchaulet

Pinging @elastic/fleet (Team:Fleet)

elasticmachine avatar Sep 03 '24 19:09 elasticmachine

@elasticmachine merge upstream

nchaulet avatar Sep 03 '24 19:09 nchaulet

@elasticmachine merge upstream

nchaulet avatar Sep 04 '24 12:09 nchaulet

LGTM, we should run some scale tests to see if multiple policy changes are affected in any way.

I will run some but it should not affect the scale tests as deploying a single policy is still a sync operation, and I think it's what we test in the multiple policy change scenario

nchaulet avatar Sep 04 '24 12:09 nchaulet

LGTM, we should run some scale tests to see if multiple policy changes are affected in any way.

I will run some but it should not affect the scale tests as deploying a single policy is still a sync operation, and I think it's what we test in the multiple policy change scenario

Is there a condition to only use the task for multiple policies? I'm not seeing that in the code, only the feature flag condition.

juliaElastic avatar Sep 04 '24 13:09 juliaElastic

Is there a condition to only use the task for multiple policies? I'm not seeing that in the code, only the feature flag condition.

The deployPolicies method has not been changed, the part I changed are in the methods that trigger bulk deploy for example _bumpPolicies

nchaulet avatar Sep 04 '24 13:09 nchaulet

@elasticmachine merge upstream

nchaulet avatar Sep 04 '24 13:09 nchaulet

@elasticmachine merge upstream

nchaulet avatar Sep 04 '24 17:09 nchaulet

@elasticmachine merge upstream

nchaulet avatar Sep 05 '24 12:09 nchaulet

@elasticmachine merge upstream

nchaulet avatar Sep 09 '24 12:09 nchaulet

:green_heart: Build Succeeded

  • Buildkite Build
  • Commit: a69c4d97bed823fe5354940231a9dcbb2dbe336f
  • Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-191839-a69c4d97bed8

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
fleet 1.8MB 1.8MB +16.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
fleet 169.7KB 169.8KB +23.0B

History

  • :yellow_heart: Build #232308 was flaky ace46dcfe00050fc2725551b3978b9bd13fdab5d
  • :yellow_heart: Build #232217 was flaky a8a2d380e96b948ae4484944b98dd7023398d51d
  • :green_heart: Build #232041 succeeded a30513578ae2ce67dcc1eb85a9516a552b68a809
  • :green_heart: Build #231956 succeeded 99035210d800f2204f1dead4082cf67b252bf955
  • :green_heart: Build #231934 succeeded 45291b03de72ceb3b33662ea5170ea4e97d91c20

To update your PR or re-run it, just comment with: @elasticmachine merge upstream

kibana-ci avatar Sep 09 '24 13:09 kibana-ci