rasdaemon
rasdaemon copied to clipboard
Logs flooded with block_rq_complete events
It seems rasdaemon is logging 2 block_rq_complete events every 2 seconds. Sure these can't all be disk errors?
Running rasdaemon 0.6.5 on Gentoo, kernel 5.6.13.
I'm seeing them also, Arch Linux running 0.6.5-2 from AUR.
May 24 13:13:51 tock rasdaemon[25367]: rasdaemon: diskerror_eventstore: 0x559839dacf00
May 24 13:13:51 tock rasdaemon[25367]: rasdaemon: register inserted at db
May 24 13:13:51 tock rasdaemon[25367]: <...>-21 [001] 0.007774: block_rq_complete: 2020-05-24 13:13:51 -0500
May 24 13:13:53 tock rasdaemon[25367]: rasdaemon: diskerror_eventstore: 0x559839dacf00
May 24 13:13:53 tock rasdaemon[25367]: rasdaemon: register inserted at db
May 24 13:13:53 tock rasdaemon[25367]: <...>-21 [001] 0.007775: block_rq_complete: 2020-05-24 13:13:53 -0500
May 24 13:13:55 tock rasdaemon[25367]: rasdaemon: diskerror_eventstore: 0x559839dacf00
May 24 13:13:55 tock rasdaemon[25367]: rasdaemon: register inserted at db
May 24 13:13:55 tock rasdaemon[25367]:
Similar thing happens on Fedora 32...
I see the identical error on Arch Linux kernel 5.7.2 with a Kingwin KWCR-801U3 Memory Card Reader.
The errors are reported every 2 seconds for each memory card slot that is empty. The device has 5 memory card slots. If I populate one or two slots (only cards I have) the errors drop from 5x -> 3x errors every 2 seconds.
Gist logs @ 3a90eda6565ed64ff2627ccb7f24a1d2 with
- rasdaemon logs
- bpftrace bioerr script and output
- standalone perf record and log
Seems something is polling the device every 2 seconds and it fails and returns -5
.
Question I have:
- Is this the kernel polling on its own?
- Is there something in Gnome polling the device?
- Is the firmware/controller on the memory card to blame? (seems not since it works fine with real cards).
- Can these filtered or ignored if this is expected for a block device such as a memory card reader without media?
The same issues are exhibited on an old USB 2.0 IOGear GFR209 Memory Card reader, so I don't suspect that this is the device controller or firmware at this point.
@kylemanna great find! I have 2 Dell U2410 monitors, each with an SD card reader. Plugging in a card in one of them reduces the block_rq_complete events to one per 2 seconds. The other reader is not accessible as it is blocked by the other monitor so can't use it as a workaround. I will look into disabling the reader as a workaround until this can be filtered in rasdaemon.
That was easy:
echo 1 > /sys/block/sdi/device/delete
The above command inhibits the block_rq_complete event for the inaccessible card reader. So by doing this and having a card inserted in the other reader, my logs are no longer flooded. Not ideal, but good enough as a workaround.
Thanks again for pointing to card readers!
As an additional data point, this seems to be triggered by the kernel itself (not sure though, didn't dig any deeper) and is displayed with btrace
:
~ ❯❯❯ sudo btrace /dev/sd{j,k,l,m,n}
8,208 4 1 0.000000000 1971106 D N 0 [kworker/u16:0]
8,160 4 1 0.000896029 351 D N 0 [kworker/4:1H]
8,160 2 1 0.001481716 28 C N [65531]
8,176 2 1 0.002215997 28 C N [65531]
8,192 2 1 0.002240982 189 D N 0 [kworker/2:1H]
8,176 5 1 0.001530792 192 D N 0 [kworker/5:1H]
8,144 5 1 0.003275401 192 D N 0 [kworker/5:1H]
8,208 3 1 0.000787045 35 C N [65531]
8,192 3 1 0.003163968 35 C N [65531]
8,144 3 1 0.003949268 35 C N [65531]
8,176 2 2 2.134056407 28 C N [65531]
8,144 2 1 2.134721766 28 C N [65531]
8,208 2 1 2.135613506 28 C N [65531]
8,160 4 2 2.135647276 351 D N 0 [kworker/4:1H]
8,160 2 2 2.136260911 28 C N [65531]
8,192 2 2 2.136294531 189 D N 0 [kworker/2:1H]
8,192 2 3 2.137180495 28 C N [65531]
8,176 3 1 2.133407440 2092023 D N 0 [kworker/u16:2]
8,144 3 2 2.134108724 190 D N 0 [kworker/3:1H]
8,208 0 1 2.134767357 191 D N 0 [kworker/0:1H]
8,176 6 1 4.269513561 324 D N 0 [kworker/6:1H]
8,160 4 3 4.266711553 1971106 D N 0 [kworker/u16:0]
8,192 4 1 4.267451333 351 D N 0 [kworker/4:1H]
8,144 4 1 4.268103277 351 D N 0 [kworker/4:1H]
8,144 2 2 4.268701934 28 C N [65531]
8,160 2 3 4.267347711 28 C N [65531]
8,192 2 4 4.268011237 28 C N [65531]
8,208 2 2 4.269472730 28 C N [65531]
8,208 3 2 4.268780240 190 D N 0 [kworker/3:1H]
8,176 3 2 4.270435353 35 C N [65531]
8,160 7 1 6.402437598 292 D N 0 [kworker/7:1H]
8,192 5 1 6.401618694 192 D N 0 [kworker/5:1H]
8,208 2 3 6.400015344 2099065 D N 0 [kworker/u16:5]
8,208 2 4 6.400746353 28 C N [65531]
8,176 2 3 6.400786333 189 D N 0 [kworker/2:1H]
8,144 4 2 6.403306835 2106419 D N 0 [kworker/u16:8]
8,176 6 2 6.401546412 56 C N [65531]
8,192 6 1 6.402379568 56 C N [65531]
8,160 6 1 6.403024128 56 C N [65531]
8,144 6 1 6.403950697 56 C N [65531]
8,192 6 2 8.535625476 324 D N 0 [kworker/6:1H]
8,144 1 1 8.533310599 2106419 D N 0 [kworker/u16:8]
8,160 1 1 8.534111119 294 D N 0 [kworker/1:1H]
8,208 7 1 8.536296498 292 D N 0 [kworker/7:1H]
8,144 2 3 8.534055160 28 C N [65531]
8,160 2 4 8.534901067 28 C N [65531]
8,176 2 4 8.535580824 28 C N [65531]
8,192 2 5 8.536218365 28 C N [65531]
8,208 2 5 8.537182627 28 C N [65531]
8,176 4 1 8.534957975 351 D N 0 [kworker/4:1H]
8,208 6 1 10.667369536 56 C N [65531]
8,192 6 3 10.671544493 324 D N 0 [kworker/6:1H]
8,192 6 4 10.672275125 56 C N [65531]
8,208 1 1 10.666689562 1971106 D N 0 [kworker/u16:0]
8,176 1 1 10.667442261 294 D N 0 [kworker/1:1H]
8,176 2 5 10.668204709 28 C N [65531]
8,160 2 5 10.670772221 28 C N [65531]
8,144 2 4 10.671500585 28 C N [65531]
8,160 4 4 10.670013979 2055637 D N 0 [kworker/u16:1]
8,144 4 3 10.670856289 351 D N 0 [kworker/4:1H]
Just ejecting and re-inserting an optical disc into a drive will cause a flood of these messages.
Is there a workaround? Echoing 1 to /sys/block/sdi/[DEVICE]/delete gets rid of these messages but also the block device ;-).
Still happens on 0.6.8. I no longer have the monitors with built-in SD card readers, but as soon as I plug in an USB SD card reader, there's again a journal entry appearing every 2 seconds.
Version 0.7.0 uses a new event to monitor disk errors. It requires Kernel 5.18 or upper. Hopefully this should help with the issue.