The timestamps in the database do not seem to correlate to the system clock

Open kaylalarson2019 opened this issue 5 years ago • 1 comments

Any time an error occurs and a database entry is created we need timestamps that reflect the time the error occurred. Currently the timestamps we are seeing are very odd, are they associated with uptime?

For example one timestamp looked like this:

ras-mc-ctl --errors

No PCIe AER errors.

No Extlog errors.

MCE events: 1 2449-05-18 20:57:15 +0000 error: MEMORY CONTROLLER RD_CHANNELunspecified_ERR Transaction: Memory read error, mcg mcgstatus=0, mci Corrected_error Error_enabled, n_errors=0, mcgcap=0x00000c09, status=0x940000000000009f, addr=0xffa84200, tsc=0x2ea83b5fdb8, walltime=0x5e7e5045, cpuid=0x000506f1, bank=0x00000001

2 2468-02-20 22:57:15 +0000 error: MEMORY CONTROLLER RD_CHANNELunspecified_ERR Transaction: Memory read error, mcg mcgstatus=0, mci Corrected_error Error_enabled, n_errors=0, mcgcap=0x00000c09, status=0x940000000000009f, addr=0xffa84200, tsc=0x308d6c79418, walltime=0x5e7e5080, cpuid=0x000506f1, bank=0x00000001

Is there a way to correlate the timestamp with the clock?

Mar 27 '20 19:03 kaylalarson2019

Whenever possible, rasdaemon tries to use the timestamps which comes from the Kernel. This way, they should match the same timestamps as shown at the dmesg.

The logic which tries to ensure that is inside select_tracing_timestamp() function: if supported by the Linux Kernel (Kernel version 3.10 and upper), it sets the tracing events to use the uptime clock.

Then, it estimates the difference from it with the one reported on userspace, and uses it internally.

As MCE won't be doing it, I would expect some differences.

Is there a way to correlate the timestamp with the clock?

Right now, no, but it should be possible to add a printk or an event tthat would store this difference at the Rasdaemon database.

Jul 21 '20 13:07 mchehab