kibana [Fleet] Introduce Async deploy policies

Summary

Related https://github.com/elastic/ingest-dev/issues/3343

Deploying a lot of policies is an expensive operation in Fleet, as we need to fetch a lot of related data to build the .fleet-policies document, that PR propose to move that for bulk operation to an async task

@juliaElastic @kpollich before I move foward with that do you think it's a direction we should follow and do you see any issue with that?

Performance impact

(tested locally on a macbook)

Updating the default output with 1000 agent policies

With the feature flag on ~3s Screenshot 2024-08-30 at 9 57 02 AM

With the feature flag off ~44s Screenshot 2024-08-30 at 9 58 31 AM

That is a big change and it seems from local testing this will allow to reach a much higher scale like 5k agent policies Screenshot 2024-08-30 at 2 05 40 PM

Cons

That approach will make unexpected error during deploys hard to catch as they will only happens in logs and not in the UI anymore.

Aug 30 '24 14:08 nchaulet

:robot: GitHub comments

Expand to view the GitHub comments

Just comment with:

/oblt-deploy : Deploy a Kibana instance using the Observability test environments.
run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

Aug 30 '24 14:08 obltmachine

I think doing the deploy policies async makes a lot of sense, anyway fleet-server picks up policies async. As you stated we should make it easy to find out from the logs/state when the deploy was not successful, and retry (which the task seems to do already). Maybe we could store the latest deployed revision on the SO or query it to display on the UI when .fleet-policies has an outdated or out of sync latest revision. The UI already shows when agents use an outdated version of an agent policy.

Sep 02 '24 08:09 juliaElastic

/ci

Sep 03 '24 15:09 nchaulet

/ci

Sep 03 '24 19:09 nchaulet

Pinging @elastic/fleet (Team:Fleet)

Sep 03 '24 19:09 elasticmachine

@elasticmachine merge upstream

Sep 03 '24 19:09 nchaulet

@elasticmachine merge upstream

Sep 04 '24 12:09 nchaulet

LGTM, we should run some scale tests to see if multiple policy changes are affected in any way.

I will run some but it should not affect the scale tests as deploying a single policy is still a sync operation, and I think it's what we test in the multiple policy change scenario

Sep 04 '24 12:09 nchaulet

LGTM, we should run some scale tests to see if multiple policy changes are affected in any way.

I will run some but it should not affect the scale tests as deploying a single policy is still a sync operation, and I think it's what we test in the multiple policy change scenario

Is there a condition to only use the task for multiple policies? I'm not seeing that in the code, only the feature flag condition.

Sep 04 '24 13:09 juliaElastic

Is there a condition to only use the task for multiple policies? I'm not seeing that in the code, only the feature flag condition.

The deployPolicies method has not been changed, the part I changed are in the methods that trigger bulk deploy for example _bumpPolicies

Sep 04 '24 13:09 nchaulet

@elasticmachine merge upstream

Sep 04 '24 13:09 nchaulet

@elasticmachine merge upstream

Sep 04 '24 17:09 nchaulet

@elasticmachine merge upstream

Sep 05 '24 12:09 nchaulet

@elasticmachine merge upstream

Sep 09 '24 12:09 nchaulet

:green_heart: Build Succeeded

Buildkite Build
Commit: a69c4d97bed823fe5354940231a9dcbb2dbe336f
Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-191839-a69c4d97bed8

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`fleet`	1.8MB	1.8MB	+16.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`fleet`	169.7KB	169.8KB	+23.0B

History

:yellow_heart: Build #232308 was flaky ace46dcfe00050fc2725551b3978b9bd13fdab5d
:yellow_heart: Build #232217 was flaky a8a2d380e96b948ae4484944b98dd7023398d51d
:green_heart: Build #232041 succeeded a30513578ae2ce67dcc1eb85a9516a552b68a809
:green_heart: Build #231956 succeeded 99035210d800f2204f1dead4082cf67b252bf955
:green_heart: Build #231934 succeeded 45291b03de72ceb3b33662ea5170ea4e97d91c20

To update your PR or re-run it, just comment with: @elasticmachine merge upstream

Sep 09 '24 13:09 kibana-ci