dnf-plugins-core icon indicating copy to clipboard operation
dnf-plugins-core copied to clipboard

Needs restarting boot time

Open RomanSoloweow opened this issue 1 year ago • 16 comments

Hello, we are using Rocky Linux versions 9.2 and 9.4 with the latest available version of the ‘needs-restarting’ plugin — 4.3.0. Recently, we encountered an issue with ‘needs-restarting’.

If your virtual machine is hosted on VMware vSphere and you revert to snapshots, then /proc/1 and /proc/stat may contain completely different dates, leading to incorrect operation of the ‘needs-restarting’ plugin.

image

RomanSoloweow avatar Feb 24 '25 10:02 RomanSoloweow

cc @kontura @ppisar

RomanSoloweow avatar Feb 24 '25 11:02 RomanSoloweow

That's worrisome. Could you explain how "VMware vSphere revert to snapshots" is implemented? Your screenshot hints that a kernel is newly booted and a before-than frozen userspace thawed.

ppisar avatar Feb 24 '25 16:02 ppisar

By the way, RHEL 9 is going to get another method of obtaining a boot time from systemd https://issues.redhat.com/browse/RHEL-14900. You can check a build for CentOS Stream https://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/Packages/. Maybe it resolves the problem for you.

ppisar avatar Feb 24 '25 16:02 ppisar

I need both methods, reboot-hint and services

RomanSoloweow avatar Feb 24 '25 18:02 RomanSoloweow

I'm not sure, but I think the machine is copied in a "frozen state" and then the copy replaces the main version which is then "thawed."

RomanSoloweow avatar Feb 24 '25 18:02 RomanSoloweow

Using KVM/libvirt instead of VMWare vSphere, I did notice a difference in grep ^btime /proc/stat on a RHEL 9.5 VM running kernel 5.14.0-503.16.1.el9_5.x86_64, after resuming from a snapshot.

The new preferred method for detecting boot time used by needs-restarting in RHEL 9.6+ is based on systemd. Specifically, it uses the UnitsLoadStartTimestamp property on /org/freedesktop/systemd1. Resuming my 9.5 VM from a snapshot had no effect on the UnitsLoadStartTimestamp, so if vSphere works similar enough to libvirt/qemu/KVM, this bug should be already fixed in the next RHEL release.

Strangely, on a CentOS 9 stream VM running kernel 5.14.0-570.el9.x86_64, I did not see a difference in grep ^btime /proc/stat after resuming a snapshot. So maybe some other change also fixed this bug?

evan-goode avatar Feb 27 '25 21:02 evan-goode

Hmm, except for needs-restarting (no arguments) we still measure process start times relative to the btime in /proc/stat. Too bad it's unreliable. /proc/uptime also seems to be incorrect after resuming a snapshot. If we do not have a reliable way to get the kernel boot time (from which process start times are measured), there's not much we can do here...

evan-goode avatar Feb 27 '25 21:02 evan-goode

Yes, I think depending on the user's configuration, these methods vary and there is no universal method. I suggest allowing to configure where the boot time will be taken from: proc/1, proc/stat, or systemd. This could be an argument to the command, or allow passing the boot time into the command. This way, everyone can choose the method that works in their specific configuration.

RomanSoloweow avatar Feb 28 '25 08:02 RomanSoloweow

The bigger issue is that we can't get correct process start times. Currently we are using column 22 (1-indexed) of /proc/pid/stat to get process start times, which only gives us the uptime of the kernel when the process was started. If the uptime stops counting (e.g. the VM is paused or is restored from an earlier snapshot), then it's not useful. To illustrate this, imagine the VM gets paused for one day. Before and after the pause, the uptime is exactly the same. So just knowing the uptime when a process was started doesn't tell you the wall clock time when it started, since it could have been started on either side of the pause, and you don't know when the pause happened.

Like needs-restarting, ps -o start also reads /proc/pid/stat[22] and gives incorrect readings when pausing/resuming VMs.

Reading the mtime or ctime of /proc/pid (stat -c %Z /proc/pid) seems to give us the absolute timestamps we want, and these are unaffected by pausing/resuming the VM, but last Petr and I checked, the behavior of these is not well-defined: https://github.com/rpm-software-management/dnf-plugins-core/pull/536.

evan-goode avatar Feb 28 '25 20:02 evan-goode

Yes, but using proc/stat contains the correct value in my case.

That’s why I’m suggesting introducing a BootTimeSource parameter and, depending on its value, retrieving it either from proc/1 or proc/stat.

Otherwise, I’d have to implement my own copy of the needs-restarting logic, which sounds terrible.

@evan-goode, @ppisar what you think about it?

RomanSoloweow avatar Mar 12 '25 12:03 RomanSoloweow

Yes, but using proc/stat contains the correct value in my case.

If the btime in your /proc/stat is correct and /proc/pid/stat[22] is also correct, and the only problem in your case is that the boot time is sometimes derived from the incorrect /proc/1 mtime, then your problem has already been fixed upstream (https://github.com/rpm-software-management/dnf-plugins-core/pull/560) and will arrive in RHEL 9.6.

After that patch, the boot time from systemd will be preferred (which seems to stay correct after resuming VMs), and process start times will always be measured relative to the btime in /proc/stat (before that patch, they could be measured relative to /proc/1 mtime).

evan-goode avatar Mar 14 '25 13:03 evan-goode

But what if we use Rocky-Linux 9.2?

RomanSoloweow avatar Mar 14 '25 23:03 RomanSoloweow

@evan-goode it's possible to release the same fix for Rocky 9.2?

RomanSoloweow avatar Mar 19 '25 08:03 RomanSoloweow

That's a question on a vendor of Rocky Linux https://wiki.rockylinux.org/rocky/bugs/. Not for an upstream project.

ppisar avatar Mar 19 '25 08:03 ppisar

@ppisar @evan-goode

When it will be released for RHEL 9.6? Last update was 4 months ago

RomanSoloweow avatar Mar 20 '25 09:03 RomanSoloweow

It will be released when RHEL 9.6 is released. Rough timing is at https://access.redhat.com/support/policy/updates/errata/#RHEL9_Planning_Guide. If you need it now, you can try builds from CentOS Stream https://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/Packages/.

ppisar avatar Mar 20 '25 09:03 ppisar