hdd-spindown.sh
hdd-spindown.sh copied to clipboard
hdd-spindown doesn't suspend disks anymore
Hi, I'm not sure if it's just me but hdd-spindown has stopped suspending disks on my machine after a recent Arch update. I have tried to track it down but was unable to figure out what's stopping it. I think it has something to do with the new Linux kernel. After upgrading to 5.5.5, hdd-spindown doesn't suspend anymore. All the logs look normal and there are no errors reported. My .rc file is super simple: an ID for a single disk and a timeout. It has worked fine for over a year so something must have changed in the kernel and the way that hdd-spindown tacks no disk activity.
Hi, I'm not sure if it's just me but hdd-spindown has stopped suspending disks on my machine after a recent Arch update. I have tried to track it down but was unable to figure out what's stopping it. I think it has something to do with the new Linux kernel. After upgrading to 5.5.5, hdd-spindown doesn't suspend anymore. All the logs look normal and there are no errors reported. My .rc file is super simple: an ID for a single disk and a timeout. It has worked fine for over a year so something must have changed in the kernel and the way that hdd-spindown tacks no disk activity.
I was thinking about using Arch as my NAS OS yesterday. Finally, I decided to use Debian. It seems to be a right choice! haha~
Hi, I'm not sure if it's just me but hdd-spindown has stopped suspending disks on my machine after a recent Arch update. I have tried to track it down but was unable to figure out what's stopping it. I think it has something to do with the new Linux kernel. After upgrading to 5.5.5, hdd-spindown doesn't suspend anymore. All the logs look normal and there are no errors reported. My .rc file is super simple: an ID for a single disk and a timeout. It has worked fine for over a year so something must have changed in the kernel and the way that hdd-spindown tacks no disk activity.
You can try the command:
hdparm -y /dev/sdx
Then use the command to check whether the disk is in standby mode? :
hdparm -C /dev/sdx
You can also use this command :
smartctl -i -n standby /dev/sdb|grep "mode"|awk '{print $4}'
Thanks to lynix for developing this tool, I just need it! I've been worried for a day because of this problem, I have two disks without advanced power management.
Unfortunately I've only got one machine left that has rotating disks (migrated anything else to NVMe SSDs) and that one is running linux-lts
which is currently at 5.4.21 on Arch.
I'll try to put something together in order to test with a recent kernel.
I've tried with a recent Arch installation on Kernel 5.5.6.arch1-1
. No issues so far, disk getting put to sleep as expected.
@nick-s-b could you please install the version from branch debug/issue-5 and see what the debugging output I have added tells you? This way we can see whether the I/O counters are not read correctly or whether it's a bug in the logic.
@lynix Thank you for a response! I've installed the issue-5 branch, I've restarted the service and I've opened a folder on the HDD in question to start up the disk. I then closed the folder and file manager and haven't touched it since. All this happened at 14:23 (according to logs below) Here's the full journal output:
-- Logs begin at Sun 2020-02-23 22:23:19 EST, end at Thu 2020-02-27 14:38:00 EST. --
Feb 23 22:23:21 user0 systemd[1]: Started Automatic Disk Standby.
Feb 23 22:23:21 user0 hdd-spindown.sh[781]: Using 300s interval
Feb 23 22:23:21 user0 hdd-spindown.sh[781]: recognized disk: ata-ST8000BB100-1GXMNA4_89HAG761 --> sdb
Feb 27 14:23:00 user0 systemd[1]: Stopping Automatic Disk Standby...
Feb 27 14:23:00 user0 systemd[1]: hdd-spindown.service: Succeeded.
Feb 27 14:23:00 user0 systemd[1]: Stopped Automatic Disk Standby.
Feb 27 14:23:00 user0 systemd[1]: Started Automatic Disk Standby.
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: Using 300s interval
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: recognized disk: ata-ST8000BB100-1GXMNA4_89HAG761 --> sdb
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old , new 100296 3975
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:28:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100296 3975, new 100351 3975
Feb 27 14:28:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:33:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100351 3975, new 100401 3975
Feb 27 14:33:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:38:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100401 3975, new 100456 3975
Feb 27 14:38:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:43:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100456 3975, new 100506 3975
Feb 27 14:43:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
From the above, service was restarted and disk activated at 14:23 and the disk should have been spun down at 14:33 (10 min later; disk timeout I specified). However, this did not happen.
systemctl
status looks like this:
● hdd-spindown.service - Automatic Disk Standby
Loaded: loaded (/usr/lib/systemd/system/hdd-spindown.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2020-02-27 14:23:00 EST; 17min ago
Main PID: 2801704 (hdd-spindown.sh)
Tasks: 2 (limit: 38343)
Memory: 2.5M
CGroup: /system.slice/hdd-spindown.service
├─2801704 /bin/bash /usr/bin/hdd-spindown.sh
└─2818704 sleep 300
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: Using 300s interval
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: recognized disk: ata-ST8000BB100-1GXMNA4_89HAG761 --> sdb
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old , new 100296 3975
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:28:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100296 3975, new 100351 3975
Feb 27 14:28:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:33:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100351 3975, new 100401 3975
Feb 27 14:33:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:38:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100401 3975, new 100456 3975
Feb 27 14:38:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
.rc file has these two lines in it:
CONF_DEV=( 'ata-ST8000BB100-1GXMNA4_89HAG761|600' )
CONF_INT=300
and I'm still on 5.5.5. (can't reboot right now... have a process that has been running for a few days and will run for another couple of days).
Linux user0 5.5.5-arch1-1 #1 SMP PREEMPT Thu, 20 Feb 2020 18:23:09 +0000 x86_64 GNU/Linux
I then used hdparm -qy /dev/sdb
to manually put the disk to sleep at 14:45. I confirmed it with:
$ sudo hdparm -C /dev/sdb
/dev/sdb:
drive state is: standby
and then the logs had this output few minutes later:
Feb 27 14:48:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100506 3975, new 100526 3975
Feb 27 14:48:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Edit 2: sdb
has now been in standby since 14:45 and here's the newest output...
Feb 27 14:53:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100526 3975, new 100536 3975
Feb 27 14:53:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:58:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100536 3975, new 100548 3975
Feb 27 14:58:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Thank you so much for looking into this!
Looking at your traces I see that the read counter (first number) is constantly increased. That's why the disk is not put to sleep.
Interestingly, in your last trace this is still the case. If the drive remained suspended during that trace then the read requests have all been served from cache, which is a weakness of my approach of determining drive activity.
So you need to find out which process keeps reading from that disk. There is an option from the Kernel exported via sysfs to dump all I/O access in dmesg, but this can get dangerous as writing these dmesg entries down to disk causes further entries and you end up in self-amplification.
I'd also think about adding an option to hdd-spindown.sh to only check for the write counter, which remains constant in your case. But that would have the downside of putting the drive to sleep too often for read-focused workloads.
@lynix Thank you! I'll look into it myself as well. I'll try to determine what's accessing the disk. I'll also update the kernel since that might be the cause. I don't think I've changed anything in my day-to-day use. I still use the same text editor, same DM, same FM et etc. One thing that has changed is that I now have two web browsers opened at all times for development. Could it be that Firefox Dev is causing this since I started using it heavily right around the update? I don't know. I'll try to narrow it down. One thing that's so weird about this is that the disk doesn't spin up at all after being put to standby yet these read counters keep increasing. I'm hoping this is the kernel issue since it will either get fixed or it will be the new normal. Thanks again. I'll report in a few days.
I would like to report that reading the drives S.M.A.R.T. causes the read value to go up. Im going to try stopping the systemd smartmontools service to see what happens.
PS: im also getting this problem. (Linux 5.6)
Apr 15 03:48:24 chrholly smartd[1413]: Device: /dev/sdf [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 203 to 196 Apr 15 03:48:52 chrholly hdd-spindown.sh[28827]: debug: sdf: I/O detected, updating counters
Apr 15 04:18:23 chrholly smartd[1413]: Device: /dev/sdf [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 196 to 185 Apr 15 04:18:24 chrholly hdd-spindown.sh[28827]: debug: sdf: I/O detected, updating counters
Stopping the service now
Apr 15 04:48:31 chrholly hdd-spindown.sh[28827]: debug: sdf: trying to suspend
Apr 15 04:48:32 chrholly hdd-spindown.sh[28827]: suspending sdf
Heh that fixed it
@nick-s-b disable your smartmontools service.
@USBhost hmmm... mine's not even running:
$ systemctl status smartd
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
Loaded: loaded (/usr/lib/systemd/system/smartd.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:smartd(8)
man:smartd.conf(5)
But yeah, hdd-spindown
is still not suspending disks here. I've been doing it manually and I also wrote a cron script to make sure they're suspended when I sleep.
Do we know more about this by now? I had the same phenomenon after upgrading from kernel 5.4, the "read" values in /proc/diskstats went up without data actually being read by any process.
@bedouin67 I could never get it run again for me and have uninstalled it. I might give it a try after I do an upgrade this weekend.
I'm happy to accept pull requests for ignoring the read counter, if SMART readouts really make that one constantly increase.
However I must admit I don't have any rotating disks at hand anymore, so I will not be able to test anything.
I don't think it's a good idea to ignore the read counter and only take the write counter for the check. I was playing a video for testing exactly this and the write counter never changes:
❯ read R_IO R_M R_S R_T W_IO REST < "/sys/block/sdd/stat"; echo "$(date) --- $R_IO $W_IO"
Mo 22. Mär 15:17:51 CET 2021 --- 1031 47
❯ read R_IO R_M R_S R_T W_IO REST < "/sys/block/sdd/stat"; echo "$(date) --- $R_IO $W_IO"
Mo 22. Mär 15:28:25 CET 2021 --- 1948 47
This issue is on my Arch install with Kernel 5.11, however, on my Pi4 running Raspbian kernel 5.10.17-v7l+ it still works fine.
Looking at the Kernel documentation for /sys/block/<dev>/stat
I see that, apart from I/O counters there is also sector counters:
Name units description
---- ----- -----------
read I/Os requests number of read I/Os processed
read merges requests number of read I/Os merged with in-queue I/O
read sectors sectors number of sectors read
read ticks milliseconds total wait time for read requests
write I/Os requests number of write I/Os processed
write merges requests number of write I/Os merged with in-queue I/O
write sectors sectors number of sectors written
write ticks milliseconds total wait time for write requests
in_flight requests number of I/Os currently in flight
io_ticks milliseconds total time this block device has been active
time_in_queue milliseconds total wait time for all requests
discard I/Os requests number of discard I/Os processed
discard merges requests number of discard I/Os merged with in-queue I/O
discard sectors sectors number of sectors discarded
discard ticks milliseconds total wait time for discard requests
Maybe the sector counters are not triggered by SMART queries? If so, we could use them to determine drive activity.
Sector counters show the same behaviour as I/O for me:
❯ read R_IO R_M R_S R_T W_IO W_M W_S REST < "/sys/block/sdd/stat"; echo "$(date) --- $R_S $W_S"
Mo 22. Mär 18:47:08 CET 2021 --- 14287 48
❯ read R_IO R_M R_S R_T W_IO W_M W_S REST < "/sys/block/sdd/stat"; echo "$(date) --- $R_S $W_S"
Mo 22. Mär 18:55:07 CET 2021 --- 14290 48
psutil.disk_io_counters(perdisk=True) This python library function may be able to read the disk io count. I tested it on windows and it works. Note that to run on windows, you need to first execute the following cmd command: diskperf -y
"smartctl -i -n standby /dev/sdx" Each time it is executed, the read IO will add 2. After the execution of this command, modify the last recorded IO count. Counteract its influence. Maybe it's a feasible way.
Run it during initialization to see if and how much the IO count will increase, and then modify the count according to the added value. This makes it compatible with different kernels.
When initializing, run it a few more times to make sure the data is correct
Probably script logic.
- Check the IO count before checking the disk status.
- If there is no change in the IO count, it is preliminary that the disk is not read or written.
- After checking the IO count, check the disk status for logging.
- After checking the status of the disk, check the IO again. If the IO increment is equal to the increment known at initialization, the new IO count overwrites the old IO count. And make sure that the disk is not read or written.
A better way. Do not judge the disk status! The setting time is continuous, and there is no IO change. Run the spindown command directly, regardless of the current disk status. Record the running log after executing the spindown command.
It seems that after the spindown command is executed, the IO will also change. So the script should immediately retrieve and record the current IO count. Now that the IO will change, you can run the command to check the status of the disk at this point to ensure that the shutdown is successful.
To prevent the log from growing indefinitely, we can add a flag bit that represents a change in IO. After the last spindown, set it to 0. And then if the IO changes, set it to 1. Judge the flag bit before recording the log.
This flag bit is not as reliable as the actual detected disk status. So only as a condition for logging.
@rankaiyx Thanks for 5 notification mails within two hours ;)
I'm not sure I can follow your explanations, specifically the latest one with the flag. Either way, this is something I would consider too 'complex' to implement or add myself without the ability to test anything. And, as I said above, I don't have any rotating disks anymore.
Feel free to fork the project and go ahead with the extended counter detection logic. I guess I will put this instance of the project to archived mode.
I'm having the same issue on my machine (running the most recent stable kernel) and I think other spindown solutions are suffering from the same problem.
According to the hd-idle readme, on kernels > 5.4 monitoring tools will alter the disk read/write count, so they moved that logic to the partition level where these values will stay the same instead.
I haven't had a look at the hdd-spindown code so far (because everything has been working great :) but this sounds like a nice project for a lazy weekend, so I might try my hand at this.
Well, I think I made it work, my disks have been happily spinning down for the past 24 hours (finally!). I rewrote some parts to use /proc/diskstats
instead of checking the individual devices, hdd-spindown will now check if stats of individual partitions have changed since the last run.
There's still some debug output left in my fork, but I changed the documentation so it's ready for a test drive at the very least :)
Please feel free to report any issues (on my repo, to keep lynix' notifications to a minimum). Once I'm confident enough that I actually created a working piece of code, I'll clean the debug output and make a pull request.
NB If it didn't before, hdd-spindown will now require bash version 4, because I'm using associative arrays for the partitions.
P.s. I'm not a programmer, not an expert either, I might have added bugs that will make your cat explode or your disks explode - or both! It works for me™, though :)
Great news @bocki! I'll happily give your PR another pair of eyes and merge when looking good.
Well, I think I made it work, my disks have been happily spinning down for the past 24 hours (finally!). I rewrote some parts to use
/proc/diskstats
instead of checking the individual devices, hdd-spindown will now check if stats of individual partitions have changed since the last run.There's still some debug output left in my fork, but I changed the documentation so it's ready for a test drive at the very least :)
Please feel free to report any issues (on my repo, to keep lynix' notifications to a minimum). Once I'm confident enough that I actually created a working piece of code, I'll clean the debug output and make a pull request.
NB If it didn't before, hdd-spindown will now require bash version 4, because I'm using associative arrays for the partitions.
P.s. I'm not a programmer, not an expert either, I might have added bugs that will make your cat explode or your disks explode - or both! It works for me™, though :)
Nice! I took a look at your repair code, and It looks great. I'll test it, and then I'll give you feedback.
Well, I think I made it work, my disks have been happily spinning down for the past 24 hours (finally!). I rewrote some parts to use
/proc/diskstats
instead of checking the individual devices, hdd-spindown will now check if stats of individual partitions have changed since the last run.There's still some debug output left in my fork, but I changed the documentation so it's ready for a test drive at the very least :)
Please feel free to report any issues (on my repo, to keep lynix' notifications to a minimum). Once I'm confident enough that I actually created a working piece of code, I'll clean the debug output and make a pull request.
NB If it didn't before, hdd-spindown will now require bash version 4, because I'm using associative arrays for the partitions.
P.s. I'm not a programmer, not an expert either, I might have added bugs that will make your cat explode or your disks explode - or both! It works for me™, though :)
sdxn may change, and it may be better to use UUID.
Well, I think I made it work, my disks have been happily spinning down for the past 24 hours (finally!). I rewrote some parts to use
/proc/diskstats
instead of checking the individual devices, hdd-spindown will now check if stats of individual partitions have changed since the last run. There's still some debug output left in my fork, but I changed the documentation so it's ready for a test drive at the very least :) Please feel free to report any issues (on my repo, to keep lynix' notifications to a minimum). Once I'm confident enough that I actually created a working piece of code, I'll clean the debug output and make a pull request. NB If it didn't before, hdd-spindown will now require bash version 4, because I'm using associative arrays for the partitions. P.s. I'm not a programmer, not an expert either, I might have added bugs that will make your cat explode or your disks explode - or both! It works for me™, though :)Nice! I took a look at your repair code, and It looks great. I'll test it, and then I'll give you feedback.
Has been tested. It works. Cheers! OS: openmediavault kernel 5.10.24