udisks icon indicating copy to clipboard operation
udisks copied to clipboard

Please let me configure the housekeeping interval or otherwise unbreak externally configured spindown

Open anordal opened this issue 6 years ago • 23 comments

Let's solve this issue.

I have 2 old Western Digital IDE harddisks that won't spin down when udisksd is running (unless I set their spindown timeout really short). I've had to:

sudo kill -SIGSTOP udisksd

and

sudo kill -SIGCONT udisksd

in order to hang/unhang the daemon as needed.

I tested modifying usisksd to set its housekeeping timeout really high (way outside the configurable spindown timeout for good measure) and that works for me:

--- a/src/udiskslinuxprovider.c
+++ b/src/udiskslinuxprovider.c
@@ -660,7 +660,7 @@ udisks_linux_provider_start (UDisksProvider *_provider)
   udisks_info ("Initialization complete");
 
   /* schedule housekeeping for every 10 minutes */
-  provider->housekeeping_timeout = g_timeout_add_seconds (10*60,
+  provider->housekeeping_timeout = g_timeout_add_seconds (13*60*60,
                                                           on_housekeeping_timeout,
                                                           provider);

A less hardcoded solution would be highly appreciated, so I don't have to run a custom version of udsiksd. Maybe some sort of blacklist of disks not to poll too often, as a configuration file.

anordal avatar Sep 17 '17 05:09 anordal

I also have this problem, HDD spindown only works on timeouts <10min which is too short.

There is some discussion going on here: https://bugs.launchpad.net/ubuntu/+source/udisks2/+bug/1281588

dp-alvarez avatar Apr 09 '18 03:04 dp-alvarez

I think this exact patch is sensible and should be accepted. Maybe they will notice it if someone makes a PR ?..

Rationale:

  • I have smartd watching my SMART, so I don't need anything else checking on my drives. I'm well aware of my SMART state since smartd actually sends me email notifications when something is wrong (unlike udisks2).
  • For a desktop user, it should be well enough to check the SMART only once in 13 hours AND at every boot. Considering that this bug breaks the functionality of all other related software (hdparm and smartd), this is a very serious issue, and is very much worth the possible tradeoff for checking less often. Breaking "serious" standard software designed specifically to watch over SMART is not a good tradeoff for a questionable benefit for desktop users.
  • Maybe a better solution could be implemented, like a simple "off" switch in the config file. But if that is too hard to engineer, this should be enough as a temporary fix, until there is a solution that does not involve breaking working configurations.

dark-penguin avatar Jun 13 '19 16:06 dark-penguin

Seems like this commit that was supposed to fix this problem does not work for some reason. If it could be fixed, that would be a perfect solution.

dark-penguin avatar Jun 13 '19 16:06 dark-penguin

I have found the problem.

If I disable smartd, and set the sleep timeout via per-device config files in /etc/udisks2, then udisksd indeed does not check those devices if there was no activity on them since the last check.

If I remove those device-specific config files and set the sleep timeout via smartd instead, then smartd seems to be unaware that those devices are actually set to suspend soon, and so it checks them.

Now the situation seems obvious:

  • Is there any way to check the device's SMART to see if it is set to suspend by something else than udisks?
  • If there is no way to see that, then udisks should not care about whether or not the device is going to sleep, and avoid checking any devices that had no activity since the last check. this is still better than breaking sleeping scheduled by hdparm or smartd.
  • There is an argument that sometimes devices may go without activity for a long time, and we still want its SMART. If this is really more important than not breaking smartd and hdparm, then this behaviour could be optional (but I would argue that not breaking other things should be the default).
  • In any case, there must be a documented, configurable way to disable all SMART checks and setting any SMART parameters, for people who use other software to do that and don't want udisks messing with it (especially since it's not possible to simply uninstall udisks without uninstalling your whole desktop environment).
  • Also, even if you use the per-device config files, and set the timeouts from udisksd so that it is aware of them, this will not help very much, because smartd is going to check them too. Which means, it will consistently reset the suspend timer by creating activity other than udisksd checks. So the only way to avoid breaking smartd is to allow turning all SMART checks and manipulation off.

dark-penguin avatar Jun 13 '19 19:06 dark-penguin

If I set the sleep timeout via per-device config files in /etc/udisks2, then udisksd indeed does not check those devices if there was no activity on them since the last check.

So that is how the spindown timeout is supposed to be configured nowadays … I feel mind blown and betrayed at the same time. Running hdparm at boot was a solved problem – I did not have the fantasy to research other ways.

You don't even need smartd to reproduce this problem – I'm not running it (and I have that supposed fix). But you have a good point that there could be other daemons poking the disks in tandem with udisks, thereby violating my assumption that a polling interval above the maximum configurable spindown timeout would be safe.

I totally agree that it can't just break sleeping scheduled by hdparm or smartd. Because that's surprising. Alternatively, the surprise part must be fixed, by documenting it as a deficiency along with its workarounds.

anordal avatar Jun 13 '19 21:06 anordal

The most ridiculous thing is that there is no way to configure SMART check frequency. As I understand it, this is desktop-oriented software: it does not send email reports like server software, and it handles things like automount. But at the same time, it insists on checking SMART data every ten minutes.

I don't even understand why does it need this data. smartd needs it only to notify the user, but I certainly don't want any software reacting on my SMART state in any way without my permission. At the same time, the suggested way to configure sleep timeouts with smartd is "set a SMART check timeout greater than sleep timeout". Which means, it's fine to only check SMART once in a few hours, as it's recommended by the "serious" server software that actually needs it for a reason. So I believe checking SMART once a day by default should be fine for a desktop user. If they know what SMART is and want to check it more often, then they will configure their system to do so - either by smartd, or by configuring udisks2 (oh wait, there is no configuration option for that!).

For now, the best solution I could find is disabling udisksd. You can't uninstall it because gvfs depends on it, and the whole desktop environment depends on gvfs, but at least you can disable it. Seeing that I could not find any information about what else does it do and why do we need it, and having already caught it red-handed doing shady stuff, I think I would actually be safer disabling it completely.

Another example of udisks2 misbehaving in the past is automounting everything without thinking, for example drives already mounted on virtual machines. That's the kind of automation I certainly don't need; if I need things mounted, I'll do that myself. So I guess disabling the daemon will do more good than bad to protect my drives...

dark-penguin avatar Jun 14 '19 04:06 dark-penguin

So, now we see that this has nothing to do with "out-of-spec" hard drives. All hard drives are affected. I guess we can change the name of this issue to "please provide a way to stop udisks2 from breaking smartd and hdparm functionality".

dark-penguin avatar Jun 14 '19 10:06 dark-penguin

at least you can disable it

Not an option if you use any KDE software, or KDE itself, as they will mystically hang forever on startup if udisks2 isn't answering on D-bus. That was why I had to SIGSTOP/SIGCONT udisks2 to let it run only when I needed it to, without stopping and starting the daemon (as that would spin up everything).

anordal avatar Jun 14 '19 12:06 anordal

Not an option if you use any KDE software, or KDE itself, as they will mystically hang forever on startup if udisks2 isn't answering on D-bus. That was why I had to SIGSTOP/SIGCONT udisks2 to let it run only when I needed it to, without stopping and starting the daemon (as that would spin up everything).

That looks like a bug somewhere in KDE. Please file a bugreport there, udisks clients are supposed to handle blocking calls asynchronously and should not block the rest of the desktop. There may be many possible scenarios of delays, e.g. waiting for CD-ROM drive to spin-up.

tbzatek avatar Jun 20 '19 15:06 tbzatek

I don't even understand why does it need this data. smartd needs it only to notify the user

And that's exactly the use case SMART monitoring has been implemented in udisks. You don't need smartd configured or installed on your system to be notified on desktop that any of your drive is failing. There are several related plugins in gnome-settings-daemon that are monitoring various kind of resources, not only health of you physical disks.

but I certainly don't want any software reacting on my SMART state in any way without my permission.

Any other software is free to "react" on SMART data.

Another example of udisks2 misbehaving in the past is automounting everything without thinking, for example drives already mounted on virtual machines. That's the kind of automation I certainly don't need; if I need things mounted, I'll do that myself. So I guess disabling the daemon will do more good than bad to protect my drives...

Please open a separate ticket on this issue. There are more parties involved in automounting that carry the actual automounting policies, udisks usually acts only as the executive party doing the real mounting job.

Still such scenario shouldn't happen. If the drive is mounted on a virtual machine, it is a responsibility of the VM to lock it exclusively in the first place.

tbzatek avatar Jun 20 '19 15:06 tbzatek

Thanks for opening #668, it helps keeping the discussion separated from a real RFE. Let's continue with the polemic here.

So tweaking housekeeping interval or making it user-configurable is most likely ineviatable anyway. The problem is there are different kinds of housekeeping in udisks, ATA SMART monitoring being just one of them. With the introduction of modules we also perform housekeeping for each one of them and each module may actually perform multiple (unrelated) tasks. Moreover this is currently tied to a single interval, even for modules. Fortunately for us the module interface is not a public API and we don't support out-of-tree modules, making necessary modifications easy.

More work is needed here and this should be first thoroughly thought through.

tbzatek avatar Jun 20 '19 16:06 tbzatek

at least you can disable it

Not an option if you use any KDE software, or KDE itself, as they will mystically hang forever on startup if udisks2 isn't answering on D-bus. That was why I had to SIGSTOP/SIGCONT udisks2 to let it run only when I needed it to, without stopping and starting the daemon (as that would spin up everything).

That sounds like something depends on it, which systemd does not make easy to troubleshoot... I just tried masking udisks2 on Kubuntu Bionic (the only test machine with KDE that I have), and it booted fine. Still, usually systemd should say what exactly are we waiting for.

I don't even understand why does it need this data. smartd needs it only to notify the user

And that's exactly the use case SMART monitoring has been implemented in udisks.

I thought so! Then it is certainly no big deal if it only checks your disks only once a day (and every bootup), and certainly there should be an easy way to disable this functionality?.. Especially if the drawback of not having a way to disable it is this serious.

but I certainly don't want any software reacting on my SMART state in any way without my permission.

Any other software is free to "react" on SMART data.

I thought it might be doing something else other than notifying the user, which I certainly would not want - it is a scary thought that some software could do something with my disks without any way to disable it!

Another example of udisks2 misbehaving in the past is automounting everything without thinking, for example drives already mounted on virtual machines. That's the kind of automation I certainly don't need; if I need things mounted, I'll do that myself. So I guess disabling the daemon will do more good than bad to protect my drives...

Please open a separate ticket on this issue. There are more parties involved in automounting that carry the actual automounting policies, udisks usually acts only as the executive party doing the real mounting job.

Still such scenario shouldn't happen. If the drive is mounted on a virtual machine, it is a responsibility of the VM to lock it exclusively in the first place.

I remember seeing this issue somewhere; as I see here, it was apparently fixed before udisks2 .

The general issue here is that if some software is not possible to "simply not use" due to being a core part of the system, then its functionality should be really well documented and very tunable. the udisks2 man page is very short, with very few functions described, so it is basically undocumented and unconfigurable. (I would be happy to be proven wrong about it, or help change it!) When people think about this thing touching their disks, a panic attack is imminent! :) Then you start googling, and see even more potentially dangerous issues in the past... Since this is indeed more if a general "polemic" issue than a specific RFE, and the one people will find first, I thought that it would be helpful to post my findings about what options we have for disabling udisks2 and why you might want to do it. Is there any document where we could read about everything udisks2 does (other than the source code) and how to disable specific parts of it?

Back to the issue at hand:

  • Migration to smartmontools would fix everything, but it's not going to happen soon.
  • I would suppose that parsing smartd's configs is indeed a bad idea. If we do that, we'll get even more furious and confused people asking why are we touching stuff belonging to an unrelated package.
  • Is my understanding correct that it's not possible to query a drive for its specified standby timeout? This would make honoring smartd's options much easier...

So, the best course of action I could think of is:

  • Provide a way to set the default ATA options for all drives. That should probably be fairly trivial, and it would give people an easy way to work around the problem by specifying a standby timeout via udisks2.
  • Provide a way to disable housekeeping checks (and document exactly what functionality the user will lose if they do that! It is absolutely not obvious that there is something other than SMART queries.) Simply setting the interval would not help very much, because then smartd and udisks will take turns waking the device up (if for some reason they are not started at the same time).
  • Consider setting the housekeeping interval to 13 hours by default (and upon starting udisksd).

If somebody is willing to put more effort into this than providing an easy workaround (and I'm not necessarily saying it's worth the effort), then consider the following algorithm as a starting point:

Treat all drives like they are going to sleep, but even more carefully:

  • only do a housekeeping job if there was activity less than one minute ago
  • if there was not, then start asking the kernel every minute if there was any activity (or maybe it's possible to place some kind of hook to detect activity?)
  • once there is, do a check and start counting the interval for this drive from this moment

This way, we can synchronize up to one minute to smartd or anything else, which should be fine in most cases, if not all of them. This would require separate countdowns for each drive; is that going to be a problem?..

dark-penguin avatar Jun 20 '19 18:06 dark-penguin

Could this be a configurable option? If we could specify the default timeout in udisks2.conf, this would actually be a fix rather than a workaround to make the problem harder to notice for most people.

I often swap drives in some machines, so specifying it per-drive makes this something you need to remember to do. And I want a 3 hours timeout, so 1 hour is still not enough, I still have to configure it.

EDIT: #668 is asking to have a way to configure the sleep timeout, and this is asking to have a way to configure the housekeeping interval. Either one would help - and wouldn't it make more sense to have parameters configurable rather than hard-coded?..

dark-penguin avatar Oct 05 '21 06:10 dark-penguin

Note that the bump to 1 hour housekeeping interval is a distro patch – you only get it if you use PLD Linux.

And I'm still using Weston instead of KDE, 4 years later, because I can't be bothered to write that config file.

anordal avatar Oct 05 '21 10:10 anordal

I stumbled over this, that can explain my observation, that KDE programs, and KDE itself, hangs waiting for udisk: https://blog.broulik.de/2022/11/performance-musings/

The most important thing to remember with Qt DBus: Never use QDBusInterface. This innocent-looking class does a blocking introspection of the interface in its constructor! (…) On my laptop, I was able to speed up Dolphin’s startup by 50ms just by removing some QDBusInterface usage in Solid (the Framework which enumerates storage devices).

anordal avatar Nov 06 '22 09:11 anordal

Solved this annoying problem by apt purge udisks2. Good riddance.

lockie avatar Nov 18 '22 21:11 lockie

Udisks2 is a dependency for many packages, you can however disable the service, for systemd: systemctl disable udisks2 and for systems that 'hard-code' enabling udisks2 service you can mask it: systemctl mask udisks2.

nlgranger avatar Nov 19 '22 10:11 nlgranger

But that also removes its useful functionality - I don't remember exactly what was it, but there was something useful about it. Something about mounting external media, not to mention its hard drive housekeeping and apparently notifications about bad SMART, if it really does that.

dark-penguin avatar Nov 19 '22 10:11 dark-penguin

I'd rather mount USB sticks manually than have my 4TB storage disk thrashed, thank you very much.

lockie avatar Nov 19 '22 11:11 lockie

I also have the same problem. I can not spin down my disks using hdparm service unless I use parameters less that 10 minutes. Its most likely because of the described problem with Udisk2. HOW do you configure Udisk2 to stop checking SMART data with such a high frequency, that it disables other very important disk management systems like spinning the disk down with a sensible interval like 30 min etc. I struggled with hdparm and the systemd services that load this service for weeks, before I read about the issue with Udisk2., and im stunned the read this post. Its been going on for years now, and its still a huge issue.

Will the programmers behind Udisk2 please take this seriously!

Its a mayor issue and very very annoying and frustrating!!!

tue-kyndal avatar Dec 03 '22 14:12 tue-kyndal

One more here. This is one of the most annoying behaviours I've seen so far. There's a thread at https://bugs.launchpad.net/ubuntu/+source/udisks2/+bug/1373318 which mentions https://launchpadlibrarian.net/186132339/AdjustableHousekeeping.patch as a possible solution, so I'm wondering why is that patch not applied and included yet.

petermolnar avatar Dec 20 '22 22:12 petermolnar

Comment from a bystander: You could open a PR with that but maybe it's not as simple as that, assuming that the last statement on this topic is still valid:

More work is needed here and this should be first thoroughly thought through.

dark-penguin already put some thought into this issue but more of scratch your own itch is needed in the path forward: someone has to sit down and do the hard work…

pothos avatar Dec 21 '22 21:12 pothos

I am experiencing the same problem with udisks2 and 10 minutes timeout. In my case if I set spin down to 9 minutes with hdparm -S 108 /dev/sda disk still doesn't spin down.

The only solution so far that worked is by using this script: https://github.com/ngandrass/truenas-spindown-timer/blob/master/spindown_timer.sh started like this: spindown_timer.sh -v -t 540 -p 60 -m -i sda

Also I tried to solve problem with:

cat /etc/udev/rules.d/80-udisks.rules
KERNEL==“sd*[!0-9]”, ATTR{removable}==“0”, ENV{ID_BUS}==“usb”, ENV{DEVTYPE}==“disk”, ENV{UDISKS_DISABLE_POLLING}=“1”
KERNEL==“sd*[!0-9]”, ATTR{removable}==“0”, ENV{ID_BUS}==“ata”, ENV{DEVTYPE}==“disk”, ENV{UDISKS_DISABLE_POLLING}=“1”
KERNEL==“sd*[!0-9]”, ATTR{removable}==“0”, ENV{ID_BUS}==“scsi”, ENV{DEVTYPE}==“disk”, ENV{ID_VENDOR}==“ATA”, ENV{UDISKS_DISABLE_POLLING}=“1”

but it didn't work as well.

Interestingly, this 80-udisks.rules solution is not working on raspbian os, but it works on osmc os. Looks like guys from osmc have patched something correctly. My disk is WD Green 4TB.

zljubisic avatar Jan 26 '23 12:01 zljubisic