AgentBaker icon indicating copy to clipboard operation
AgentBaker copied to clipboard

Explicit configuration interface

Open maxwolffe opened this issue 1 year ago • 7 comments

Moving some private discussion I've had with @alexeldeib into a public ticket to increase bus factor.

Is your feature request related to a problem?/Why is this needed Our team uses a custom AKS image which has a few dependencies which are currently provided by AgentBaker's customnodedata:

We currently depend on the following files to be provided via nodecustomdata: /etc/default/kubelet /var/lib/kubelet/bootstrap-kubeconfig  /etc/kubernetes/certs/ca.crt

Within those files we depend on: /etc/default/kubelet:

  • node labels
  • kubelet command line parameters

/var/libe/kubelet/bootstrap-kubeconfg:

  • cluster-server:
name: localcluster
  cluster:
    certificate-authority: /etc/kubernetes/certs/ca.crt
    server: https://dev-azure-westus-xxx-cc782afe.hcp.westus.azmk8s
  • token
- name: kubelet-bootstrap
  user:
    token: "sbiizf.avcjfgfj5h3oni"
  • ca.crt - we depend on the whole file for the bootstrap kube-config.

We explicitly prevent the CustomScriptExtension from running by touching the /opt/azure/containers/provision.complete file which CSE checks prior to running. We don't want CSE to run because it does node level configuration which conflicts with our own.

Describe the solution you'd like in detail Ideally, the interface in AgentBaker we come up with:

  • Allows us to test that these files are present prior to CSE running 
  • Allows us to confirm that CSE does not run if the following file is already set: /opt/azure/containers/provision.complete 

There are a few options which stand out to me:

  1. Write a test just confirming that those files are populated in nodedata
  2. Write a new file interface (bootstrap.cfg) which either: a. Has those files included in it b. Has the fields we need only included
  3. Some mechanism to fetch these dynamically from within the node, so we can completely remove the dependency on data provided by AgentBaker (though we’d shift the dependency to requiring that the fields are accessible somehow)

After discussions with @alexeldeib - there's a preference to not have the data provided via customdata.

Describe alternatives you've considered

Additional context We've had a number of incidents due to us not having a clear contract around node configuration, so hoping to work with y'all to get one defined. Thanks in advance!

maxwolffe avatar Feb 28 '23 17:02 maxwolffe

@maxwolffe is preference to not have the data provided via customdata referring to your current file dependencies?

juan-lee avatar Mar 07 '23 15:03 juan-lee

I think the preference is to have a bootstrap.yaml that placed by customdata, and contains:

  • ca.crt
  • bootstrap token
  • API server host name
  • Node labels

then the node can bootstrap itself via custom logic (and no CSE needed)

yangl900 avatar Mar 07 '23 17:03 yangl900

@maxwolffe is preference to not have the data provided via customdata referring to your current file dependencies?

Hey @juan-lee ! Apologies, the message is a little unclear. We don't have a strong preference for how the data is provided, just that it's provided in some way that does not depend on CSE running, is accessible to services on the node, and is testable via some Agentbaker E2E test or unit test.

@alexeldeib had suggested that providing the files via customnodedata was not preferred because it's more challenging to update if parameters need to change.

Some possibilities include:

  • bootstrap.yml copied onto the node via customnodedata or userdata and placed in a consistent location. (as Anders mentioned) with our required fields - this would be our preferred option I think, since it seems easy to unit test.
  • The entire files in which the data is currently present (/etc/default/kubelet, /var/libe/kubelet/bootstrap-kubeconfg, etc) placed in a consistent location with tests.
  • Some mechanism for us to fetch the data at node bootstrap time (via IMDS or a clear API).

Happy to hop on a teams call or google meet to discuss if you'd like more context.

maxwolffe avatar Mar 07 '23 19:03 maxwolffe

Thanks for the explanation @maxwolffe. I'll sync up with @alexeldeib.

juan-lee avatar Mar 08 '23 13:03 juan-lee

Thanks for your help adding the above test and https://github.com/Azure/AgentBaker/pull/3104 unit test (was just finally looking into how to do this myself when I saw UtheMan's fix). Appreciate your help to add this protection.

maxwolffe avatar May 01 '23 16:05 maxwolffe

@maxwolffe I swear there was one more file I couldn't recall. I checked our messages but I only found those four. Do you know if I missed one?

also, expect some potential movement on these soon, which was part of the reason for the test ;)

alexeldeib avatar May 01 '23 16:05 alexeldeib

I think we have one dumb dependency which we'll soon be able to remove - kubelet.service - opened a PR to add it to the interface.

maxwolffe avatar May 01 '23 17:05 maxwolffe