hdd-spindown.sh hdd-spindown doesn't suspend disks anymore

Hi, I'm not sure if it's just me but hdd-spindown has stopped suspending disks on my machine after a recent Arch update. I have tried to track it down but was unable to figure out what's stopping it. I think it has something to do with the new Linux kernel. After upgrading to 5.5.5, hdd-spindown doesn't suspend anymore. All the logs look normal and there are no errors reported. My .rc file is super simple: an ID for a single disk and a timeout. It has worked fine for over a year so something must have changed in the kernel and the way that hdd-spindown tacks no disk activity.

Feb 25 '20 20:02 nick-s-b

Hi, I'm not sure if it's just me but hdd-spindown has stopped suspending disks on my machine after a recent Arch update. I have tried to track it down but was unable to figure out what's stopping it. I think it has something to do with the new Linux kernel. After upgrading to 5.5.5, hdd-spindown doesn't suspend anymore. All the logs look normal and there are no errors reported. My .rc file is super simple: an ID for a single disk and a timeout. It has worked fine for over a year so something must have changed in the kernel and the way that hdd-spindown tacks no disk activity.

I was thinking about using Arch as my NAS OS yesterday. Finally, I decided to use Debian. It seems to be a right choice! haha~

Feb 27 '20 08:02 rankaiyx

Hi, I'm not sure if it's just me but hdd-spindown has stopped suspending disks on my machine after a recent Arch update. I have tried to track it down but was unable to figure out what's stopping it. I think it has something to do with the new Linux kernel. After upgrading to 5.5.5, hdd-spindown doesn't suspend anymore. All the logs look normal and there are no errors reported. My .rc file is super simple: an ID for a single disk and a timeout. It has worked fine for over a year so something must have changed in the kernel and the way that hdd-spindown tacks no disk activity.

You can try the command:
hdparm -y /dev/sdx

Then use the command to check whether the disk is in standby mode? :
hdparm -C /dev/sdx

You can also use this command :
smartctl -i -n standby /dev/sdb|grep "mode"|awk '{print $4}'

Feb 27 '20 08:02 rankaiyx

Thanks to lynix for developing this tool, I just need it! I've been worried for a day because of this problem, I have two disks without advanced power management.

Feb 27 '20 08:02 rankaiyx

Unfortunately I've only got one machine left that has rotating disks (migrated anything else to NVMe SSDs) and that one is running linux-lts which is currently at 5.4.21 on Arch.

I'll try to put something together in order to test with a recent kernel.

Feb 27 '20 18:02 lynix

I've tried with a recent Arch installation on Kernel 5.5.6.arch1-1. No issues so far, disk getting put to sleep as expected.

@nick-s-b could you please install the version from branch debug/issue-5 and see what the debugging output I have added tells you? This way we can see whether the I/O counters are not read correctly or whether it's a bug in the logic.

Feb 27 '20 19:02 lynix

@lynix Thank you for a response! I've installed the issue-5 branch, I've restarted the service and I've opened a folder on the HDD in question to start up the disk. I then closed the folder and file manager and haven't touched it since. All this happened at 14:23 (according to logs below) Here's the full journal output:

-- Logs begin at Sun 2020-02-23 22:23:19 EST, end at Thu 2020-02-27 14:38:00 EST. --
Feb 23 22:23:21 user0 systemd[1]: Started Automatic Disk Standby.
Feb 23 22:23:21 user0 hdd-spindown.sh[781]: Using 300s interval
Feb 23 22:23:21 user0 hdd-spindown.sh[781]: recognized disk: ata-ST8000BB100-1GXMNA4_89HAG761 --> sdb
Feb 27 14:23:00 user0 systemd[1]: Stopping Automatic Disk Standby...
Feb 27 14:23:00 user0 systemd[1]: hdd-spindown.service: Succeeded.
Feb 27 14:23:00 user0 systemd[1]: Stopped Automatic Disk Standby.
Feb 27 14:23:00 user0 systemd[1]: Started Automatic Disk Standby.
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: Using 300s interval
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: recognized disk: ata-ST8000BB100-1GXMNA4_89HAG761 --> sdb
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old , new 100296 3975
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:28:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100296 3975, new 100351 3975
Feb 27 14:28:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:33:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100351 3975, new 100401 3975
Feb 27 14:33:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:38:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100401 3975, new 100456 3975
Feb 27 14:38:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:43:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100456 3975, new 100506 3975
Feb 27 14:43:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters

From the above, service was restarted and disk activated at 14:23 and the disk should have been spun down at 14:33 (10 min later; disk timeout I specified). However, this did not happen.

systemctl status looks like this:

● hdd-spindown.service - Automatic Disk Standby
     Loaded: loaded (/usr/lib/systemd/system/hdd-spindown.service; enabled; vendor preset: disabled)
     Active: active (running) since Thu 2020-02-27 14:23:00 EST; 17min ago
   Main PID: 2801704 (hdd-spindown.sh)
      Tasks: 2 (limit: 38343)
     Memory: 2.5M
     CGroup: /system.slice/hdd-spindown.service
             ├─2801704 /bin/bash /usr/bin/hdd-spindown.sh
             └─2818704 sleep 300

Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: Using 300s interval
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: recognized disk: ata-ST8000BB100-1GXMNA4_89HAG761 --> sdb
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old , new 100296 3975
Feb 27 14:23:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:28:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100296 3975, new 100351 3975
Feb 27 14:28:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:33:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100351 3975, new 100401 3975
Feb 27 14:33:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:38:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100401 3975, new 100456 3975
Feb 27 14:38:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters

.rc file has these two lines in it:

CONF_DEV=(  'ata-ST8000BB100-1GXMNA4_89HAG761|600' )
CONF_INT=300

and I'm still on 5.5.5. (can't reboot right now... have a process that has been running for a few days and will run for another couple of days).

Linux user0 5.5.5-arch1-1 #1 SMP PREEMPT Thu, 20 Feb 2020 18:23:09 +0000 x86_64 GNU/Linux

I then used hdparm -qy /dev/sdb to manually put the disk to sleep at 14:45. I confirmed it with:

$ sudo hdparm -C /dev/sdb

/dev/sdb:
 drive state is:  standby

and then the logs had this output few minutes later:

Feb 27 14:48:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100506 3975, new 100526 3975
Feb 27 14:48:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters

Edit 2: sdb has now been in standby since 14:45 and here's the newest output...


Feb 27 14:53:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100526 3975, new 100536 3975
Feb 27 14:53:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters
Feb 27 14:58:00 user0 hdd-spindown.sh[2801704]: debug: sdb: old 100536 3975, new 100548 3975
Feb 27 14:58:00 user0 hdd-spindown.sh[2801704]: debug: sdb: I/O detected, updating counters

Thank you so much for looking into this!

Feb 27 '20 19:02 nick-s-b

Looking at your traces I see that the read counter (first number) is constantly increased. That's why the disk is not put to sleep.

Interestingly, in your last trace this is still the case. If the drive remained suspended during that trace then the read requests have all been served from cache, which is a weakness of my approach of determining drive activity.

So you need to find out which process keeps reading from that disk. There is an option from the Kernel exported via sysfs to dump all I/O access in dmesg, but this can get dangerous as writing these dmesg entries down to disk causes further entries and you end up in self-amplification.

I'd also think about adding an option to hdd-spindown.sh to only check for the write counter, which remains constant in your case. But that would have the downside of putting the drive to sleep too often for read-focused workloads.

Feb 27 '20 20:02 lynix

@lynix Thank you! I'll look into it myself as well. I'll try to determine what's accessing the disk. I'll also update the kernel since that might be the cause. I don't think I've changed anything in my day-to-day use. I still use the same text editor, same DM, same FM et etc. One thing that has changed is that I now have two web browsers opened at all times for development. Could it be that Firefox Dev is causing this since I started using it heavily right around the update? I don't know. I'll try to narrow it down. One thing that's so weird about this is that the disk doesn't spin up at all after being put to standby yet these read counters keep increasing. I'm hoping this is the kernel issue since it will either get fixed or it will be the new normal. Thanks again. I'll report in a few days.

Feb 27 '20 20:02 nick-s-b

I would like to report that reading the drives S.M.A.R.T. causes the read value to go up. Im going to try stopping the systemd smartmontools service to see what happens.

PS: im also getting this problem. (Linux 5.6)

Apr 15 03:48:24 chrholly smartd[1413]: Device: /dev/sdf [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 203 to 196 Apr 15 03:48:52 chrholly hdd-spindown.sh[28827]: debug: sdf: I/O detected, updating counters

Apr 15 04:18:23 chrholly smartd[1413]: Device: /dev/sdf [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 196 to 185 Apr 15 04:18:24 chrholly hdd-spindown.sh[28827]: debug: sdf: I/O detected, updating counters

Stopping the service now

Apr 15 04:48:31 chrholly hdd-spindown.sh[28827]: debug: sdf: trying to suspend
Apr 15 04:48:32 chrholly hdd-spindown.sh[28827]: suspending sdf

Heh that fixed it

Apr 15 '20 03:04 USBhost

@nick-s-b disable your smartmontools service.

Apr 15 '20 04:04 USBhost

@USBhost hmmm... mine's not even running:

$ systemctl status smartd
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
     Loaded: loaded (/usr/lib/systemd/system/smartd.service; disabled; vendor preset: disabled)
     Active: inactive (dead)
       Docs: man:smartd(8)
             man:smartd.conf(5)

But yeah, hdd-spindown is still not suspending disks here. I've been doing it manually and I also wrote a cron script to make sure they're suspended when I sleep.

Apr 17 '20 16:04 nick-s-b

Do we know more about this by now? I had the same phenomenon after upgrading from kernel 5.4, the "read" values in /proc/diskstats went up without data actually being read by any process.

Mar 19 '21 14:03 bedouin67

@bedouin67 I could never get it run again for me and have uninstalled it. I might give it a try after I do an upgrade this weekend.

Mar 20 '21 01:03 nick-s-b

I'm happy to accept pull requests for ignoring the read counter, if SMART readouts really make that one constantly increase.

However I must admit I don't have any rotating disks at hand anymore, so I will not be able to test anything.

Mar 20 '21 10:03 lynix

I don't think it's a good idea to ignore the read counter and only take the write counter for the check. I was playing a video for testing exactly this and the write counter never changes:

❯ read R_IO R_M R_S R_T W_IO REST < "/sys/block/sdd/stat"; echo "$(date) --- $R_IO $W_IO"
Mo 22. Mär 15:17:51 CET 2021 --- 1031 47

❯ read R_IO R_M R_S R_T W_IO REST < "/sys/block/sdd/stat"; echo "$(date) --- $R_IO $W_IO"
Mo 22. Mär 15:28:25 CET 2021 --- 1948 47

This issue is on my Arch install with Kernel 5.11, however, on my Pi4 running Raspbian kernel 5.10.17-v7l+ it still works fine.

Mar 22 '21 14:03 NoXPhasma

Looking at the Kernel documentation for /sys/block/<dev>/stat I see that, apart from I/O counters there is also sector counters:

Name            units         description
----            -----         -----------
read I/Os       requests      number of read I/Os processed
read merges     requests      number of read I/Os merged with in-queue I/O
read sectors    sectors       number of sectors read
read ticks      milliseconds  total wait time for read requests
write I/Os      requests      number of write I/Os processed
write merges    requests      number of write I/Os merged with in-queue I/O
write sectors   sectors       number of sectors written
write ticks     milliseconds  total wait time for write requests
in_flight       requests      number of I/Os currently in flight
io_ticks        milliseconds  total time this block device has been active
time_in_queue   milliseconds  total wait time for all requests
discard I/Os    requests      number of discard I/Os processed
discard merges  requests      number of discard I/Os merged with in-queue I/O
discard sectors sectors       number of sectors discarded
discard ticks   milliseconds  total wait time for discard requests

Maybe the sector counters are not triggered by SMART queries? If so, we could use them to determine drive activity.

Mar 22 '21 17:03 lynix

Sector counters show the same behaviour as I/O for me:

❯ read R_IO R_M R_S R_T W_IO W_M W_S REST < "/sys/block/sdd/stat"; echo "$(date) --- $R_S $W_S"
Mo 22. Mär 18:47:08 CET 2021 --- 14287 48

❯ read R_IO R_M R_S R_T W_IO W_M W_S REST < "/sys/block/sdd/stat"; echo "$(date) --- $R_S $W_S"
Mo 22. Mär 18:55:07 CET 2021 --- 14290 48

Mar 22 '21 18:03 NoXPhasma

psutil.disk_io_counters(perdisk=True) This python library function may be able to read the disk io count. I tested it on windows and it works. Note that to run on windows, you need to first execute the following cmd command: diskperf -y

Apr 22 '21 14:04 rankaiyx

"smartctl -i -n standby /dev/sdx" Each time it is executed, the read IO will add 2. After the execution of this command, modify the last recorded IO count. Counteract its influence. Maybe it's a feasible way.

Apr 22 '21 14:04 rankaiyx

Run it during initialization to see if and how much the IO count will increase, and then modify the count according to the added value. This makes it compatible with different kernels.

Apr 22 '21 15:04 rankaiyx

When initializing, run it a few more times to make sure the data is correct

Apr 22 '21 15:04 rankaiyx

Probably script logic.

Check the IO count before checking the disk status.
If there is no change in the IO count, it is preliminary that the disk is not read or written.
After checking the IO count, check the disk status for logging.
After checking the status of the disk, check the IO again. If the IO increment is equal to the increment known at initialization, the new IO count overwrites the old IO count. And make sure that the disk is not read or written.

Apr 22 '21 15:04 rankaiyx

A better way. Do not judge the disk status! The setting time is continuous, and there is no IO change. Run the spindown command directly, regardless of the current disk status. Record the running log after executing the spindown command.

It seems that after the spindown command is executed, the IO will also change. So the script should immediately retrieve and record the current IO count. Now that the IO will change, you can run the command to check the status of the disk at this point to ensure that the shutdown is successful.

To prevent the log from growing indefinitely, we can add a flag bit that represents a change in IO. After the last spindown, set it to 0. And then if the IO changes, set it to 1. Judge the flag bit before recording the log.

This flag bit is not as reliable as the actual detected disk status. So only as a condition for logging.

Apr 22 '21 16:04 rankaiyx

@rankaiyx Thanks for 5 notification mails within two hours ;)

I'm not sure I can follow your explanations, specifically the latest one with the flag. Either way, this is something I would consider too 'complex' to implement or add myself without the ability to test anything. And, as I said above, I don't have any rotating disks anymore.

Feel free to fork the project and go ahead with the extended counter detection logic. I guess I will put this instance of the project to archived mode.

Apr 22 '21 16:04 lynix

I'm having the same issue on my machine (running the most recent stable kernel) and I think other spindown solutions are suffering from the same problem.

According to the hd-idle readme, on kernels > 5.4 monitoring tools will alter the disk read/write count, so they moved that logic to the partition level where these values will stay the same instead.

I haven't had a look at the hdd-spindown code so far (because everything has been working great :) but this sounds like a nice project for a lazy weekend, so I might try my hand at this.

Apr 28 '21 08:04 bocki

Well, I think I made it work, my disks have been happily spinning down for the past 24 hours (finally!). I rewrote some parts to use /proc/diskstats instead of checking the individual devices, hdd-spindown will now check if stats of individual partitions have changed since the last run.

There's still some debug output left in my fork, but I changed the documentation so it's ready for a test drive at the very least :)

Please feel free to report any issues (on my repo, to keep lynix' notifications to a minimum). Once I'm confident enough that I actually created a working piece of code, I'll clean the debug output and make a pull request.

NB If it didn't before, hdd-spindown will now require bash version 4, because I'm using associative arrays for the partitions.

P.s. I'm not a programmer, not an expert either, I might have added bugs that will make your cat explode or your disks explode - or both! It works for me™, though :)

May 05 '21 22:05 bocki

Great news @bocki! I'll happily give your PR another pair of eyes and merge when looking good.

May 06 '21 07:05 lynix

Well, I think I made it work, my disks have been happily spinning down for the past 24 hours (finally!). I rewrote some parts to use /proc/diskstats instead of checking the individual devices, hdd-spindown will now check if stats of individual partitions have changed since the last run.

There's still some debug output left in my fork, but I changed the documentation so it's ready for a test drive at the very least :)

Please feel free to report any issues (on my repo, to keep lynix' notifications to a minimum). Once I'm confident enough that I actually created a working piece of code, I'll clean the debug output and make a pull request.

NB If it didn't before, hdd-spindown will now require bash version 4, because I'm using associative arrays for the partitions.

P.s. I'm not a programmer, not an expert either, I might have added bugs that will make your cat explode or your disks explode - or both! It works for me™, though :)

Nice! I took a look at your repair code, and It looks great. I'll test it, and then I'll give you feedback.

May 19 '21 12:05 rankaiyx

Well, I think I made it work, my disks have been happily spinning down for the past 24 hours (finally!). I rewrote some parts to use /proc/diskstats instead of checking the individual devices, hdd-spindown will now check if stats of individual partitions have changed since the last run.

There's still some debug output left in my fork, but I changed the documentation so it's ready for a test drive at the very least :)

Please feel free to report any issues (on my repo, to keep lynix' notifications to a minimum). Once I'm confident enough that I actually created a working piece of code, I'll clean the debug output and make a pull request.

NB If it didn't before, hdd-spindown will now require bash version 4, because I'm using associative arrays for the partitions.

P.s. I'm not a programmer, not an expert either, I might have added bugs that will make your cat explode or your disks explode - or both! It works for me™, though :)

sdxn may change, and it may be better to use UUID.

May 19 '21 14:05 rankaiyx

Well, I think I made it work, my disks have been happily spinning down for the past 24 hours (finally!). I rewrote some parts to use /proc/diskstats instead of checking the individual devices, hdd-spindown will now check if stats of individual partitions have changed since the last run. There's still some debug output left in my fork, but I changed the documentation so it's ready for a test drive at the very least :) Please feel free to report any issues (on my repo, to keep lynix' notifications to a minimum). Once I'm confident enough that I actually created a working piece of code, I'll clean the debug output and make a pull request. NB If it didn't before, hdd-spindown will now require bash version 4, because I'm using associative arrays for the partitions. P.s. I'm not a programmer, not an expert either, I might have added bugs that will make your cat explode or your disks explode - or both! It works for me™, though :)

Nice! I took a look at your repair code, and It looks great. I'll test it, and then I'll give you feedback.

Has been tested. It works. Cheers! OS: openmediavault kernel 5.10.24

May 19 '21 16:05 rankaiyx

hdd-spindown.sh hdd-spindown.sh copied to clipboard

hdd-spindown doesn't suspend disks anymore

hdd-spindown.sh
hdd-spindown.sh copied to clipboard