cloud-init
cloud-init copied to clipboard
`cloud-init status --long` reports `done degraded` and exits 2 on retries when accessing the Azure IMDS
Bug report
On some very particular cases™, running on Azure, I found cloud-init on a done degraded
state, and it messed up with the Pro Client CI.
Those cases happen when launching:
- Ubuntu Pro Noble (not the
-gen1
suffixed image from the marketplace, the one in pycloudlib); - On a
Standard_BS2
VM type; - When the instance happens to not have been pre-provisioned.
- most of the time. sometimes it is just OK.
This is pretty much a corner case, yes, but it affects us in the sense that we run many tests on our CI, one instance per test, and the results are now flaky because we expect cloud-init to succeed and to be in a good state.
This happens because cloud-init cannot access the metadata/reprovisiondata?api-version=2019-06-01'
on first try, so it retries at least once. But then, when the retry happens, a WARNING
is issued and cloud-init status' rc becomes nonzero.
The thing is that it does manage to access the IMDS a little later when the retry happens, and everything works fine cloud-init wise (and even instance-wise)
Maybe this is not an issue. But if it is, there are two interesting things to explore imho:
-
Just food for thought: It seems the Azure IMDS is taking a little longer, but not much, to respond on this particular instance type. Maybe that happens with other instance types? Or is there something wrong with the image? What may be causing that? Is that fixable or workaroundable? What are the downsides if this endpoint is actually unreachable?
-
From a cloud-init perspective, should retries that end up succeeding degrade the status? Should those retries be softer-warnings, or even
DEBUG
entries?
Steps to reproduce the problem
Launch many Ubuntu Pro Noble instances on Azure, using the BS2
machine type. I did it using pycloudlib.
The non-pre-provisioned ones (check with uptime
or cloud-init analyze show
) will present this issue most of the time.
Im far from an expert and sorry if there are docs about this I didn't find - but if there is a way of requesting Azure to launch a proper fresh instance everytime this may be easier to reproduce.
Environment details
- Cloud-init version:
24.1.3-0ubuntu3
- Operating System Distribution:
Ubuntu 24.04 LTS
- Cloud provider, platform or installer type: Azure
Standard_B2s
VM - Cloud image (publisher:product:plan):
Canonical:ubuntu-24_04-lts:ubuntu-pro:latest
cloud-init logs
2024-05-23 18:48:04,055 - azure.py[WARNING]: Polling IMDS failed attempt 1 with exception: UrlError('404 Client Error: Not Found for url: http://169.254.169.254/metadata/reprovisiondata?api-version=2019-06-01')
2024-05-23 18:48:05,064 - url_helper.py[DEBUG]: Read from http://169.254.169.254/metadata/reprovisiondata?api-version=2019-06-01 (200, 1920b) after 2 attempts
leads to
$ cloud-init status --long
status: done
extended_status: degraded done
boot_status_code: enabled-by-generator
last_update: Thu, 23 May 2024 18:48:20 +0000
detail: DataSourceAzure [seed=/dev/sr0]
errors: []
recoverable_errors:
WARNING:
- Polling IMDS failed attempt 1 with exception: UrlError('404 Client Error: Not Found for url: http://169.254.169.254/metadata/reprovisiondata?api-version=2019-06-01')
$ echo $?
2