open-vm-tools icon indicating copy to clipboard operation
open-vm-tools copied to clipboard

post-customize.service does not proper work due to time sync via ntp or chronyc after reboot

Open 4815162342lost opened this issue 1 year ago • 4 comments

Describe the bug

I am not sure, but i think that sometimes i had bug with post-customize service due to time synchronization

My case: RHEL 7.9.

On first boot (without network) chronyc does not work.

From logs: /var/log/vmware-imc/toolsDeployPkg.log (My local time +3ITC, so, 14:10 is 17:10)

2023-03-16T14:10:52 DEBUG: Command: '/bin/touch /.guest-customization-post-reboot-pending' 
2023-03-16T14:10:52 DEBUG: Exit Code: 0 

After that server go to reboot: [2023-03-16T14:10:52.393Z] [ debug] Command to exec : '/sbin/telinit'.

Rebooted, then server started and post-customize service and cronyc stared simultaneously.

Mar 16 17:11:04 vm-lt2-bd-kafka-1609 systemd[1]: Started Run post-customization script.
Mar 16 17:11:04 vm-lt2-bd-kafka-1609 chronyd[739]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)

But chronyc change my time:

Mar 16 17:11:08 vm-lt2-bd-kafka-1609 chronyd[739]: System clock wrong by -9.988691 seconds, adjustment started
Mar 16 17:10:58 vm-lt2-bd-kafka-1609 chronyd[739]: System clock was stepped by -9.988691 seconds

And /root/.customization/post-customize-guest.sh due to following line in script: if [ $now -lt $guardTime ]; then

Exit, because current time ($now) less or equal to creation time +10 sec of '/.guest-customization-post-reboot-pending' file

Logs:

Mar 16 17:10:59 customize-guest: Post-reboot agent started
Mar 16 17:10:59 customize-guest: The argument is ''
Mar 16 17:10:59 customize-guest: Running as non-service
Mar 16 17:10:59 customize-guest: Execution continues...
Mar 16 17:10:59 customize-guest: Waiting the GOS to reboot. Skipping

How can we avoid this? I think than post-customize-guest.sh must start before ntp\chrony?

Reproduction steps

Configure chronyc and make hw clock "real time + 1 min"

Expected behavior

Post-customize will work correctly

Additional context

FIX: maybe change post-customize service to start before ntp\chronyc\systemd-timesync?

No response

4815162342lost avatar Mar 16 '23 15:03 4815162342lost

Hi 4815162342lost,

Yes, post-customize service and cronyc stared simultaneously lead to this issue. So far, it seems that change post-customize service to start before ntp\chronyc\systemd-timesync is a good idea. I am investigating to add Before=chronyd.service in post-customize.service, I will reply you later when having some results.

Best Regards, Cheng Cheng

chengcheng-chcheng avatar Mar 20 '23 08:03 chengcheng-chcheng

Hi 4815162342lost,

Here is a workaround to run the post-customize in the current problematic scenario. Could you try to reboot the vm again when the issue happened and check whether the post-customize is running after this time reboot? As when the issue happened, the post customize script was skipped as the log "Waiting the GOS to reboot. Skipping" mentioned, but the guest-customization-post-reboot-pending is still exist, if you reboot the vm again, the post-customize script should run again.

Best Regards, Cheng Cheng

chengcheng-chcheng avatar Mar 21 '23 09:03 chengcheng-chcheng

Hello, thanks for your reply.

Yes, your workaround works, i checked earlier.

But we have several automation scripts and etc. Workaround unwanted for our infrastructure.

But if there no any proper way to fix issue, we can add "Second reboot" to our infrastructure.

Or could you add "Second reboot" if needed to /root/.customization/post-customize-guest.sh script? /root/.customization/post-customize-guest.sh should check, if issue with time sync occurred, and make second reboot or restart post-customize service after several time if needed. Or something else.

But main problem, workaround will not work, if chronyc will change time to significant amount (for example, on 5-10 min.) I.e. need "proper fix".

4815162342lost avatar Mar 21 '23 09:03 4815162342lost

Hi 4815162342lost, Yes, the 2nd reset workaround is only a temporary way, we will further investigate how to fix the problem of this scenario from the code. Thanks for the details.

chengcheng-chcheng avatar Mar 23 '23 06:03 chengcheng-chcheng