Make GetCloudinitStatus() more robust
Describe the bug
Cloud-init now returns three types of return codes 0 (success), 1(irrecoverable error) and 2 (recoverable error) [1]. The status parsing code needs to be improved so that it does not prematurely reboot the node even when cloud-init has indicated recoverable error and is still running cloud-final stage.
Expected behavior
open-vm-tools waits until cloud-init has completed cloud-final stage even when there is reoverable errors (schema parsing issues).
Additional context
- https://cloudinit.readthedocs.io/en/latest/explanation/return_codes.html
In toolsDeployPkg.log, we can see now it takes " exitcode: 2" as failed, for cloud-init, it is just recoverable error and cloud-init is continue running so open-vm-tools should wait when getting cloud-init status exitcode 2.
[2025-06-05T02:16:30.607Z] [ info] sSkipReboot: 'false', forceSkipReboot 'false'. [2025-06-05T02:16:30.607Z] [ info] Do not trigger reboot if cloud-init is executing. [2025-06-05T02:16:30.607Z] [ debug] Command to exec : '/usr/bin/cloud-init'. [2025-06-05T02:16:30.607Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:30.607Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:30.607Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:30.808Z] [ info] Saving output from stdout [2025-06-05T02:16:30.908Z] [ info] Process exited normally after 0 seconds, returned 0 [2025-06-05T02:16:30.908Z] [ info] No more output from stdout [2025-06-05T02:16:30.908Z] [ info] No more output from stderr [2025-06-05T02:16:30.908Z] [ info] Customization command output: 'status: running '. [2025-06-05T02:16:30.908Z] [ info] Cloud-init status is 'running'. [2025-06-05T02:16:35.908Z] [ debug] Command to exec : '/usr/bin/cloud-init'. [2025-06-05T02:16:35.908Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:35.909Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:35.909Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:36.109Z] [ info] Saving output from stdout [2025-06-05T02:16:36.209Z] [ info] Process exited normally after 0 seconds, returned 0 [2025-06-05T02:16:36.209Z] [ info] No more output from stdout [2025-06-05T02:16:36.209Z] [ info] No more output from stderr [2025-06-05T02:16:36.209Z] [ info] Customization command output: 'status: running '. [2025-06-05T02:16:36.209Z] [ info] Cloud-init status is 'running'. [2025-06-05T02:16:41.210Z] [ debug] Command to exec : '/usr/bin/cloud-init'. [2025-06-05T02:16:41.210Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:41.210Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:41.210Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:41.513Z] [ info] Process exited normally after 0 seconds, returned 0 [2025-06-05T02:16:41.513Z] [ info] Saving output from stdout [2025-06-05T02:16:41.513Z] [ info] No more output from stdout [2025-06-05T02:16:41.513Z] [ info] No more output from stderr [2025-06-05T02:16:41.513Z] [ info] Customization command output: 'status: running '. [2025-06-05T02:16:41.513Z] [ info] Cloud-init status is 'running'. [2025-06-05T02:16:46.514Z] [ debug] Command to exec : '/usr/bin/cloud-init'. [2025-06-05T02:16:46.514Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:46.514Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:46.514Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:46.714Z] [ info] Saving output from stdout [2025-06-05T02:16:46.814Z] [ info] Process exited normally after 0 seconds, returned 0 [2025-06-05T02:16:46.814Z] [ info] No more output from stdout [2025-06-05T02:16:46.814Z] [ info] No more output from stderr [2025-06-05T02:16:46.814Z] [ info] Customization command output: 'status: running '. [2025-06-05T02:16:46.814Z] [ info] Cloud-init status is 'running'. [2025-06-05T02:16:51.815Z] [ debug] Command to exec : '/usr/bin/cloud-init'. [2025-06-05T02:16:51.815Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:51.815Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:51.816Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:52.116Z] [ info] Process exited normally after 0 seconds, returned 2 [2025-06-05T02:16:52.116Z] [ info] Saving output from stdout [2025-06-05T02:16:52.116Z] [ info] No more output from stdout [2025-06-05T02:16:52.116Z] [ info] No more output from stderr [2025-06-05T02:16:52.116Z] [ info] Customization command output: 'status: running '. [2025-06-05T02:16:52.116Z] [ error] Customization command failed with exitcode: 2, stderr: ''. [2025-06-05T02:16:52.116Z] [ info] Unable to get cloud-init status. [2025-06-05T02:16:52.116Z] [ info] Cloud-init execution is not on-going. [2025-06-05T02:16:52.116Z] [ debug] Ran DeployPkg_DeployPackageFromFile successfully [2025-06-05T02:16:52.116Z] [ debug] ## Closing log [2025-06-05T02:16:52.116Z] [ debug] Command to exec : '/bin/readlink'. [2025-06-05T02:16:52.116Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:52.117Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:52.117Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:52.217Z] [ info] Process exited normally after 0 seconds, returned 0 [2025-06-05T02:16:52.217Z] [ info] Saving output from stdout [2025-06-05T02:16:52.217Z] [ info] No more output from stdout [2025-06-05T02:16:52.217Z] [ info] No more output from stderr [2025-06-05T02:16:52.217Z] [ info] Customization command output: '../bin/systemctl '. [2025-06-05T02:16:52.217Z] [ debug] /sbin/telinit is a soft link to systemctl [2025-06-05T02:16:52.217Z] [ info] Trigger reboot.
Thanks for reporting this issue. We will get the Guest Customization team to triage this issue.
In the meantime, would you please the following information: guest OS and release: open-vm-tools version: VM hypervisor used:
Thanks for reporting this issue. We will get the Guest Customization team to triage this issue.
In the meantime, would you please the following information: guest OS and release: open-vm-tools version: VM hypervisor used:
John, please see this https://issues.redhat.com/browse/RHEL-96831 . I believe you guys can access the jira. We can discuss this ticket more details if required in our next RH/VMware sync-up.
Thanks Ani.
Sorry, I cannot read the issue. Please make it accessible to the Broadcom team.
Thanks Ani.
Sorry, I cannot read the issue. Please make it accessible to the Broadcom team.
See https://github.com/canonical/cloud-init/issues/6280 instead. Has the same details.
Thanks @ani-sinha for bringing up this issue. Yes, now guest customization process only treats 0 as THE expected return code of command /usr/bin/cloud-init status.
I think you meant to mention https://github.com/canonical/cloud-init/pull/4500/ but https://github.com/canonical/cloud-init/pull/628, I will update code here to support the new return codes.
I think you meant to mention canonical/cloud-init#4500 but canonical/cloud-init#628,
I meant to say https://github.com/canonical/cloud-init/issues/6280
@PengpengSun any updates on the fix?
@ani-sinha I've fixed this in our main branch, change will be available in the next tools release.
@ani-sinha I've fixed this in our main branch, change will be available in the next tools release.
Can you please merge it upstream so that we may back port?
Sure, let me check with @johnwvmw .