kubernetes-elastic-agents icon indicating copy to clipboard operation
kubernetes-elastic-agents copied to clipboard

Implement agent reuse, toggled by a cluster profile config option

Open brandonvin opened this issue 2 years ago • 5 comments

Description

This implements agent reuse following the approach outlined in these comments: https://github.com/gocd/kubernetes-elastic-agents/issues/53#issuecomment-1399398498 https://github.com/gocd/kubernetes-elastic-agents/issues/53#issuecomment-1406967391. Resolves https://github.com/gocd/kubernetes-elastic-agents/issues/53

Cluster profile has an option to enable agent reuse, defaulting to false:

Gocd cluster profile option edit

Gocd cluster profile option

The naming, and presentation of this option, are open to feedback! 🙂

Changes

When pods are created, they are annotated with a hash of the elastic config. This ensures that when elastic config is changed, agents created from the old config will not be reused, and will eventually expire after the timeout.

When agent reuse is enabled, the main behavior changes are:

  • Job completion request -> instead of terminating the agent, mark the agent as idle so it may be considered for reuse
  • Should assign work request -> assign work only if the agent is available for work and the elastic config hash matches the agent
  • Create agent request -> only create an agent if none are available for reuse

Other supporting changes:

  • KubernetesInstance is an immutable data class constructed by a builder, and no longer contains a Kubernetes client.
  • Improving field names (e.g. properties) for clarity between cluster profile properties and elastic profile properties.
  • Refreshing instances moved inside the ServerPingRequestExecutor instead of "inline" in the KubernetesPlugin request handling block.
  • Added a request handler for REQUEST_PLUGIN_SETTINGS that returns an empty map - just to silence some warnings about this request not being implemented.
  • Some classes are refactored to allow testing with less mocking.
  • Removed an unnecessary semaphore.

This ended up being a pretty large set of changes. Happy to explain in more detail if needed!

Testing

Many unit tests updated and added, and run with ./gradlew test. I've also tested this running GoCD on my machine with a Kubernetes cluster in kind.

brandonvin avatar May 20 '23 19:05 brandonvin

Thanks for this. Appreciate all the work here. Unfortunately I have limited capacity to review, try out and give useful insight so might take me some time 🙏

chadlwilson avatar Jun 01 '23 07:06 chadlwilson

Nice work +1 on this

woopstar avatar Aug 21 '23 18:08 woopstar