rasdaemon icon indicating copy to clipboard operation
rasdaemon copied to clipboard

rasdaemon not logging

Open DonKatsu opened this issue 1 year ago • 9 comments

Distro: Fedora 37 KDE Kernel: 6.1.8 rasdaemon version: 0.6.8 CPU: Ryzen 9 5900x

Due to the erroneous reporting of disk errors by rasdaemon bloating my log, I deleted the files ras-mc_event.db and ras-mc_event.db-journal in /var/lib/rasdaemon. After restarting the rasdaemon service clean ones were created. Since then I had noticed it stopped logging those false disk errors. Then eventually I got another MCE error, and noticed that one wasn't logged either. (Not sure if doing that was directly related, but the timing lined up.) I reinstalled rasdaemon and waited for another one to happen to be sure. This latest one wasn't logged either, and those supposed disk errors still aren't as well even though the service still seems to be reporting them. Screenshot. Journal log of a systemctl restart rasdaemon. Those core dumps happen on a fresh boot as well.

I have uninstalled mcelog, and I don't have the ras-mc-ctl service enabled since it fails and exits due to my system not having ECC memory.

DonKatsu avatar Jan 31 '23 20:01 DonKatsu

There is a known regression with Kernel 6.1. The fix depends on both adding a patch to the Linux Kernel and a change in rasdaemon. See: https://github.com/mchehab/rasdaemon/commit/6986d818e6d2c846c001fc7211b5a4153e5ecd11

The Kernel patch was already merged and backported to Kernel 6.1.12: https://lwn.net/Articles/923307/.

I merged today the rasdaemon patch and released version 0.8.0, but Fedora packages don't contain the regression fix yet.

I'm planning to cherry-pick the fix and apply for Fedora 36 and 37 later today.

Anyway if you want to check, you can either wait for 6.1.12 or download it from koji, and build rasdaemon from the sources using make mock, and then install the package from the SPRMS/ directory.

mchehab avatar Feb 18 '23 09:02 mchehab

I added a Fedora 37 package, based on version 0.6.8: https://bodhi.fedoraproject.org/updates/FEDORA-2023-e1ccb95257. Yet, I'd appreciate feedback on version 0.8.0 as well, as it is now using libtraceevent.

mchehab avatar Feb 18 '23 12:02 mchehab

I now have both kernel 6.1.12, and rasdaemon 0.6.8 which hit Fedora's stable repo last night.

rasdaemon 0.6.8 hasn't segfaulted as expected. But now it gives a SELinux denial for attempting to access dac_override when it's started. Still, the rasdaemon processes are alive and the service is active (running).

After getting rasdaemon 0.6.8 and checking ras-mc-ctl --errors I saw these reported disk errors. I hadn't checked it since making this issue, so I have to assume they were made when stated. The last modified dates for ras-mc_event.db and ras-mc_event.db-journal are the 23rd and 24th respectively. An hour before the 8th's entries, I had upgraded to kernel 6.1.10 and likely immediately restarted.

DonKatsu avatar Feb 27 '23 16:02 DonKatsu

Hello,

The dac_override capability is requested on an access attempt where DAC permission do not allow this access and usually indicate a problem with the permissions. Please use strace to locate the files or turn on full auditing to gather more information.

1) Open the /etc/audit/rules.d/audit.rules file in an editor.
2) Remove the following line if it exists:
-a task,never
3) Add the following line to the end of the file:
-w /etc/shadow -p w
4) Restart the audit daemon:
  # service auditd restart
5) Re-run your scenario.
6) Collect AVC denials:
  # ausearch -i -m avc,user_avc,selinux_err,user_selinux_err -ts today

zpytela avatar Apr 28 '23 10:04 zpytela

I finally had another MCE event while still on Fedora 37 with rasdaemon 0.6.8. Immediately after the kernel notified of the MCE error, rasdaemon immediately crashed and restarted 5 times before finally settling down and throwing its selinux denial. (Though apparently there were ones for each crash.) It did not log the MCE error when checking with ras-mc-ctl --errors. Here's the journal from the event. Gist And for some reason, it's saying rasdaemon: Old kernel detected. Stop listening and fall back to pthread way. despite being on kernel 6.2.11 there? It still says that on Fedora 38 with kernel 6.2.13.

I've now updated to Fedora 38, and have rasdaemon 0.8.0.

@zpytela This is what I get after following that and restarting rasdaemon 0.8.0:

ausearch -i -m avc,user_avc,selinux_err,user_selinux_err -ts today
----
type=AVC msg=audit(05/02/23 13:11:25.905:1301) : avc:  denied  { dac_override } for  pid=543881 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=AVC msg=audit(05/02/23 13:11:26.409:1315) : avc:  denied  { dac_override } for  pid=543931 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=AVC msg=audit(05/02/23 13:11:26.896:1329) : avc:  denied  { dac_override } for  pid=543984 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=AVC msg=audit(05/02/23 13:11:27.405:1343) : avc:  denied  { dac_override } for  pid=544029 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=AVC msg=audit(05/02/23 13:11:27.911:1359) : avc:  denied  { dac_override } for  pid=544087 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=AVC msg=audit(05/02/23 17:23:30.819:111) : avc:  denied  { dac_override } for  pid=3215 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=PROCTITLE msg=audit(05/02/23 17:43:14.738:305) : proctitle=/usr/sbin/rasdaemon -f -r 
type=PATH msg=audit(05/02/23 17:43:14.738:305) : item=0 name=/sys/kernel/debug/tracing/instances/rasdaemon/buffer_percent inode=56828 dev=00:0c mode=file,440 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:tracefs_t:s0 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 
type=CWD msg=audit(05/02/23 17:43:14.738:305) : cwd=/ 
type=SYSCALL msg=audit(05/02/23 17:43:14.738:305) : arch=x86_64 syscall=openat success=no exit=EACCES(Permission denied) a0=AT_FDCWD a1=0x7ffee456a810 a2=O_WRONLY a3=0x0 items=1 ppid=1 pid=14786 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rasdaemon exe=/usr/sbin/rasdaemon subj=system_u:system_r:rasdaemon_t:s0 key=(null) 
type=AVC msg=audit(05/02/23 17:43:14.738:305) : avc:  denied  { dac_override } for  pid=14786 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 

DonKatsu avatar May 02 '23 23:05 DonKatsu

Thank you, I can confirm that. I've created a kernel bz to make the file read-write. https://bugzilla.redhat.com/show_bug.cgi?id=2192910

zpytela avatar May 03 '23 13:05 zpytela

Since that kernel change was implemented, I've no longer seen any rasdaemon related selinux denials.

I am still getting repeated crashes from rasdaemon 0.8.0 however. This is from the start of my most recent session. journal_snip.txt coredumpctl_gdb_rasdaemon.txt

DonKatsu avatar Jun 20 '23 20:06 DonKatsu

@DonKatsu The service was starting on my vm without errors, so please file a new bz on the ras component.

zpytela avatar Jun 23 '23 13:06 zpytela

Sorry, I didn't mean to imply the crashing was to do with selinux.

Had an MCE event today after nothing for two months. Didn't get picked up by rasdaemon again, ras-mc-ctl --errors still shows No MCE errors. rasdaemon had crashed at the same time the corrected error was reported by the kernel. log.txt

DonKatsu avatar Jul 14 '23 00:07 DonKatsu