open-vm-tools icon indicating copy to clipboard operation
open-vm-tools copied to clipboard

Make GetCloudinitStatus() more robust

Open ani-sinha opened this issue 5 months ago • 11 comments

Describe the bug

Cloud-init now returns three types of return codes 0 (success), 1(irrecoverable error) and 2 (recoverable error) [1]. The status parsing code needs to be improved so that it does not prematurely reboot the node even when cloud-init has indicated recoverable error and is still running cloud-final stage.

Expected behavior

open-vm-tools waits until cloud-init has completed cloud-final stage even when there is reoverable errors (schema parsing issues).

Additional context

  1. https://cloudinit.readthedocs.io/en/latest/explanation/return_codes.html

ani-sinha avatar Jul 03 '25 11:07 ani-sinha

In toolsDeployPkg.log, we can see now it takes " exitcode: 2" as failed, for cloud-init, it is just recoverable error and cloud-init is continue running so open-vm-tools should wait when getting cloud-init status exitcode 2.

[2025-06-05T02:16:30.607Z] [ info] sSkipReboot: 'false', forceSkipReboot 'false'. [2025-06-05T02:16:30.607Z] [ info] Do not trigger reboot if cloud-init is executing. [2025-06-05T02:16:30.607Z] [ debug] Command to exec : '/usr/bin/cloud-init'. [2025-06-05T02:16:30.607Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:30.607Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:30.607Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:30.808Z] [ info] Saving output from stdout [2025-06-05T02:16:30.908Z] [ info] Process exited normally after 0 seconds, returned 0 [2025-06-05T02:16:30.908Z] [ info] No more output from stdout [2025-06-05T02:16:30.908Z] [ info] No more output from stderr [2025-06-05T02:16:30.908Z] [ info] Customization command output: 'status: running '. [2025-06-05T02:16:30.908Z] [ info] Cloud-init status is 'running'. [2025-06-05T02:16:35.908Z] [ debug] Command to exec : '/usr/bin/cloud-init'. [2025-06-05T02:16:35.908Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:35.909Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:35.909Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:36.109Z] [ info] Saving output from stdout [2025-06-05T02:16:36.209Z] [ info] Process exited normally after 0 seconds, returned 0 [2025-06-05T02:16:36.209Z] [ info] No more output from stdout [2025-06-05T02:16:36.209Z] [ info] No more output from stderr [2025-06-05T02:16:36.209Z] [ info] Customization command output: 'status: running '. [2025-06-05T02:16:36.209Z] [ info] Cloud-init status is 'running'. [2025-06-05T02:16:41.210Z] [ debug] Command to exec : '/usr/bin/cloud-init'. [2025-06-05T02:16:41.210Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:41.210Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:41.210Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:41.513Z] [ info] Process exited normally after 0 seconds, returned 0 [2025-06-05T02:16:41.513Z] [ info] Saving output from stdout [2025-06-05T02:16:41.513Z] [ info] No more output from stdout [2025-06-05T02:16:41.513Z] [ info] No more output from stderr [2025-06-05T02:16:41.513Z] [ info] Customization command output: 'status: running '. [2025-06-05T02:16:41.513Z] [ info] Cloud-init status is 'running'. [2025-06-05T02:16:46.514Z] [ debug] Command to exec : '/usr/bin/cloud-init'. [2025-06-05T02:16:46.514Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:46.514Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:46.514Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:46.714Z] [ info] Saving output from stdout [2025-06-05T02:16:46.814Z] [ info] Process exited normally after 0 seconds, returned 0 [2025-06-05T02:16:46.814Z] [ info] No more output from stdout [2025-06-05T02:16:46.814Z] [ info] No more output from stderr [2025-06-05T02:16:46.814Z] [ info] Customization command output: 'status: running '. [2025-06-05T02:16:46.814Z] [ info] Cloud-init status is 'running'. [2025-06-05T02:16:51.815Z] [ debug] Command to exec : '/usr/bin/cloud-init'. [2025-06-05T02:16:51.815Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:51.815Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:51.816Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:52.116Z] [ info] Process exited normally after 0 seconds, returned 2 [2025-06-05T02:16:52.116Z] [ info] Saving output from stdout [2025-06-05T02:16:52.116Z] [ info] No more output from stdout [2025-06-05T02:16:52.116Z] [ info] No more output from stderr [2025-06-05T02:16:52.116Z] [ info] Customization command output: 'status: running '. [2025-06-05T02:16:52.116Z] [ error] Customization command failed with exitcode: 2, stderr: ''. [2025-06-05T02:16:52.116Z] [ info] Unable to get cloud-init status. [2025-06-05T02:16:52.116Z] [ info] Cloud-init execution is not on-going. [2025-06-05T02:16:52.116Z] [ debug] Ran DeployPkg_DeployPackageFromFile successfully [2025-06-05T02:16:52.116Z] [ debug] ## Closing log [2025-06-05T02:16:52.116Z] [ debug] Command to exec : '/bin/readlink'. [2025-06-05T02:16:52.116Z] [ info] sizeof ProcessInternal is 56 [2025-06-05T02:16:52.117Z] [ info] Returning, pending output from stdout [2025-06-05T02:16:52.117Z] [ info] Returning, pending output from stderr [2025-06-05T02:16:52.217Z] [ info] Process exited normally after 0 seconds, returned 0 [2025-06-05T02:16:52.217Z] [ info] Saving output from stdout [2025-06-05T02:16:52.217Z] [ info] No more output from stdout [2025-06-05T02:16:52.217Z] [ info] No more output from stderr [2025-06-05T02:16:52.217Z] [ info] Customization command output: '../bin/systemctl '. [2025-06-05T02:16:52.217Z] [ debug] /sbin/telinit is a soft link to systemctl [2025-06-05T02:16:52.217Z] [ info] Trigger reboot.

xiachen-rh avatar Jul 03 '25 13:07 xiachen-rh

Thanks for reporting this issue. We will get the Guest Customization team to triage this issue.

In the meantime, would you please the following information: guest OS and release: open-vm-tools version: VM hypervisor used:

johnwvmw avatar Jul 03 '25 13:07 johnwvmw

Thanks for reporting this issue. We will get the Guest Customization team to triage this issue.

In the meantime, would you please the following information: guest OS and release: open-vm-tools version: VM hypervisor used:

John, please see this https://issues.redhat.com/browse/RHEL-96831 . I believe you guys can access the jira. We can discuss this ticket more details if required in our next RH/VMware sync-up.

ani-sinha avatar Jul 03 '25 13:07 ani-sinha

Thanks Ani.

Sorry, I cannot read the issue. Please make it accessible to the Broadcom team.

johnwvmw avatar Jul 03 '25 13:07 johnwvmw

Thanks Ani.

Sorry, I cannot read the issue. Please make it accessible to the Broadcom team.

See https://github.com/canonical/cloud-init/issues/6280 instead. Has the same details.

ani-sinha avatar Jul 03 '25 14:07 ani-sinha

Thanks @ani-sinha for bringing up this issue. Yes, now guest customization process only treats 0 as THE expected return code of command /usr/bin/cloud-init status. I think you meant to mention https://github.com/canonical/cloud-init/pull/4500/ but https://github.com/canonical/cloud-init/pull/628, I will update code here to support the new return codes.

PengpengSun avatar Jul 04 '25 02:07 PengpengSun

I think you meant to mention canonical/cloud-init#4500 but canonical/cloud-init#628,

I meant to say https://github.com/canonical/cloud-init/issues/6280

ani-sinha avatar Jul 04 '25 03:07 ani-sinha

@PengpengSun any updates on the fix?

ani-sinha avatar Sep 01 '25 09:09 ani-sinha

@ani-sinha I've fixed this in our main branch, change will be available in the next tools release.

PengpengSun avatar Sep 01 '25 10:09 PengpengSun

@ani-sinha I've fixed this in our main branch, change will be available in the next tools release.

Can you please merge it upstream so that we may back port?

ani-sinha avatar Sep 01 '25 10:09 ani-sinha

Sure, let me check with @johnwvmw .

PengpengSun avatar Sep 01 '25 10:09 PengpengSun