cluster-api-provider-azure
cluster-api-provider-azure copied to clipboard
Enable configuration and disabling of boot diagnostics
/kind feature
Describe the solution you'd like [A clear and concise description of what you want to happen.]
Currently, as per the implementation from #901, boot diagnostics are enabled by default and use an Azure managed storage account. This storage account limits the boot logs to 1Gb per machine and has a cost of $0.05/perGB/Month.
You can also use your own storage account with boot diagnostics which would allow you to set customer retention policies for the data and also have more access to the underlying files being created should you need to.
I would like to propose that we add configuration options so that a user can configure their own storage account if they need to, but then also have the ability to disable the diagnostics should they not be interested.
In large deployments, eg a managed service running thousands of Machines, this data is costly and likely not always required (how often do we expect boot failures?), so I think it would be good to have the choice to disable the diagnostics as well to save money in large deployments.
I would expect an API something along the lines of, but happy to have discussion if others have suggestions:
diagnostics:
boot:
storageAccountType: AzureManaged | CustomerManaged | Disabled # defaults to AzureManaged for backwards compatibility
customerManaged: # This is only valid to be set when the account type is customer managed (a discriminated union)
storageAccountName: <blah>
Happy to volunteer to implement this if this is something maintainers feel comfortable supporting.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
- cluster-api-provider-azure version:
- Kubernetes version: (use
kubectl version
): - OS (e.g. from
/etc/os-release
):
No objections from my part to making this configurable and enabled by default to preserve back compat.
For the API, we might want to have storageUri
instead of storage account name since that's what the Azure APIs use. I would also avoid the terms "CustomerManaged" and "AzureManaged" and instead stick with the official terms, i.e. Managed
and UserManaged
https://docs.microsoft.com/en-us/azure/virtual-machines/boot-diagnostics.
The way the Azure API works is as follows, for reference:
// DiagnosticsProfile specifies the boot diagnostic settings state. <br><br>Minimum api-version:
// 2015-06-15.
type DiagnosticsProfile struct {
// BootDiagnostics - Boot Diagnostics is a debugging feature which allows you to view Console Output and Screenshot to diagnose VM status. <br>**NOTE**: If storageUri is being specified then ensure that the storage account is in the same region and subscription as the VM. <br><br> You can easily view the output of your console log. <br><br> Azure also enables you to see a screenshot of the VM from the hypervisor.
BootDiagnostics *BootDiagnostics `json:"bootDiagnostics,omitempty"`
}
// BootDiagnostics boot Diagnostics is a debugging feature which allows you to view Console Output and
// Screenshot to diagnose VM status. <br><br> You can easily view the output of your console log. <br><br>
// Azure also enables you to see a screenshot of the VM from the hypervisor.
type BootDiagnostics struct {
// Enabled - Whether boot diagnostics should be enabled on the Virtual Machine.
Enabled *bool `json:"enabled,omitempty"`
// StorageURI - Uri of the storage account to use for placing the console output and screenshot. <br><br>If storageUri is not specified while enabling boot diagnostics, managed storage will be used.
StorageURI *string `json:"storageUri,omitempty"`
}
That makes sense to me, on the StorageURI
front, the only question I have is, since that's a common format https://<account-name>.blob.core.windows.net/
, I was thinking we could make it easier for users by just doing that string substitution for them, rather than having them have to provider the full URI. Is that not something you would normally do in CAPZ?
It would be something we could do if we were confident no other formats were possible but given https://docs.microsoft.com/en-us/azure/storage/common/storage-account-overview#azure-dns-zone-endpoints-preview that may already not be true.
Cool, I'll look into working up a PR over the next few days
/assign @JoelSpeed
Following up from the discussion, here is a PR that implements the agreed design: https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2528
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
This is waiting on review