zram-generator icon indicating copy to clipboard operation
zram-generator copied to clipboard

reset-device sometimes fails

Open r-vdp opened this issue 9 months ago • 4 comments

Sometimes the reset-device command seems to exit with code 1 without any further information, leaving the initialised device in place, which then causes issues when trying to restart the systemd-zram-setup@zram0 service:

Mar 09 05:15:56 jobs-staging kernel: zram: Added device: zram0
Mar 09 05:15:56 jobs-staging systemd[1]: Created slice Slice /system/systemd-zram-setup.
Mar 09 05:15:56 jobs-staging systemd[1]: Expecting device /dev/zram0...
Mar 09 05:15:57 jobs-staging kernel: zram0: detected capacity change from 0 to 6311120
Mar 09 05:15:57 jobs-staging systemd[1]: Found device /dev/zram0.
Mar 09 05:15:57 jobs-staging systemd[1]: Starting Create swap on /dev/zram0...
Mar 09 05:15:57 jobs-staging systemd-makefs[627]: /dev/zram0 successfully formatted as swap (label "zram0", uuid 766a397f-d87f-4637-a7ef-df3f87a5167f)
Mar 09 05:15:57 jobs-staging systemd[1]: Finished Create swap on /dev/zram0.
Mar 09 05:15:57 jobs-staging systemd[1]: Activating swap Compressed Swap on /dev/zram0...
Mar 09 05:15:57 jobs-staging kernel: Adding 3155556k swap on /dev/zram0.  Priority:5 extents:1 across:3155556k SSDsc
Mar 09 05:15:57 jobs-staging systemd[1]: Activated swap Compressed Swap on /dev/zram0.
Mar 11 10:46:16 jobs-staging systemd[1]: dev-zram0.swap: Deactivated successfully.
Mar 11 10:46:16 jobs-staging systemd[1]: Deactivated swap Compressed Swap on /dev/zram0.
Mar 11 10:46:16 jobs-staging systemd[1]: Stopping Create swap on /dev/zram0...
Mar 11 10:46:16 jobs-staging systemd[1]: [email protected]: Control process exited, code=exited, status=1/FAILURE
Mar 11 10:46:16 jobs-staging systemd[1]: [email protected]: Failed with result 'exit-code'.
Mar 11 10:46:16 jobs-staging systemd[1]: Stopped Create swap on /dev/zram0.
Mar 11 10:46:19 jobs-staging systemd[1]: Starting Create swap on /dev/zram0...
Mar 11 10:46:19 jobs-staging kernel: zram: Can't change algorithm for initialized device
Mar 11 10:46:19 jobs-staging systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
Mar 11 10:46:19 jobs-staging systemd[1]: [email protected]: Failed with result 'exit-code'.
Mar 11 10:46:19 jobs-staging systemd[1]: Failed to start Create swap on /dev/zram0.
Mar 11 10:46:19 jobs-staging systemd[1]: Dependency failed for Compressed Swap on /dev/zram0.
Mar 11 10:46:19 jobs-staging systemd[1]: dev-zram0.swap: Job dev-zram0.swap/start failed with result 'dependency'.

Manually resetting the device with echo 1 | tee /sys/block/zram0/reset allows the service to be started again.

I've seen this many times now on different machines, but I haven't been able to accurately reproduce it.

r-vdp avatar Mar 11 '25 16:03 r-vdp

This corresponds to setup::run_device_reset(&dev) returning an Err which then returns Err from main... which should log

Error: [contents of error]

to the standard error stream which I don't see in the log. Maybe it got misattributed somehow? (I don't think this should be the case since we have default I/O in the service.) Do you have anything in the unfiltered journal from around the time where the first stoppage and failure happens (Mar 11 10:46:16)?

The code itself is

pub fn run_device_reset(device_name: &str) -> Result<()> {
    let reset = Path::new("/sys/block").join(device_name).join("reset");
    fs::write(reset, b"1")?;
    Ok(())
}

which is basically infallible (literally equivalent to printf 1 > /sys/block/$1/reset) and there's no other error paths.

If you don't have anything in the journal, could you perhaps modify [email protected] to have ExecStop=strace ... instead?

nabijaczleweli avatar Mar 11 '25 18:03 nabijaczleweli

Yeah, I checked the source code and I don't get how this can fail either.

How do I see those unfiltered logs?

I can try the strace later, because I keep on running into this on different servers on a weekly basis.

r-vdp avatar Mar 17 '25 23:03 r-vdp

An un-refined journalctl --since=... --until=... should be ground-truth I think

nabijaczleweli avatar Mar 17 '25 23:03 nabijaczleweli

Ah, in that case there's nothing more...

I'll try with strace.

r-vdp avatar Mar 17 '25 23:03 r-vdp