udisks icon indicating copy to clipboard operation
udisks copied to clipboard

Method calls not fully synchronous

Open vpodzime opened this issue 6 years ago • 16 comments

If a method that results in objects' properties being changed is called, the call finishes while the properties are still being updated by the "udev feedback loop". So in some sense such method calls are not fully synchronous because they finish before all the related changes are done.

This affects a number of methods, but typical examples are part/FS resize (#572) and/or FS (un)mount (#573).

We need to fix this because otherwise we are forcing many users of the UDisks2 API to wait for the properties to be fully synchronized in their code. And it's of course not 100% clear what is the full set of properties to check after a particular method call.

There are multiple difficulties that need to be taken into account:

  • we cannot change the API
  • waiting for the properties to sync up may cause a DBus call timeout
  • things can be changed outside of UDisks2, even concurrently

vpodzime avatar Sep 22 '18 10:09 vpodzime

A proposed solution to this

We will add a new standard option for setting the total method call timeout. The default will be 0 meaning no change in the API and behaviour. -1 will mean "wait forever" (not recommended) and any positive value would specify the maximum number of seconds the method call is allowed take including waiting for all affected properties to synchronize after a successful execution of the method's changes/operations. Because the changes/operations in the method call may take significant (and unpredictable) time on their own, the specified timeout has to be a total sum of operations + property-sync.

If the specified timeout is reached, a new standard error will be returned indicating that the method call was successful, but properties are not yet fully synchronized. The operation/changes done by the method call will never be interrupted because of the timeout no matter how small it is.

The caller will be responsible for making sure that the specified method-call timeout is shorter than the DBus method call timeout.

vpodzime avatar Sep 22 '18 10:09 vpodzime

@vojtechtrefny @pothos @psusi @aruiz, what do you think about the above suggestion?

vpodzime avatar Sep 22 '18 10:09 vpodzime

I'm not sure about 0 as the default, it would mean that clients need to change for a sane behavior.

pothos avatar Sep 22 '18 11:09 pothos

I'm not sure about 0 as the default, it would mean that clients need to change for a sane behavior.

Yes, but unfortunately UDisks2 has always had this insane behavior and it shouldn't just change in a minor version. And I don't think it's worth branching and releasing a new major version.

vpodzime avatar Sep 25 '18 08:09 vpodzime

Sorry for taking so long to respond, I've been a little busy with the birth of my third child.

Is it even really possible to have the API wait for properties to be updated? Even if Udisks does not complete the API until it sees the properties updated, are there enough ordering guarantees in the DBus stack to be sure that when the client gets the message and performs post-processing of the API call, that the client sees the updated property? Why should the client even care? I can't think of any good reason it should. What originated this was the fact that g-d-u loops over the mount points property and keeps trying to unmount any found until the list is empty, and so it assumes the list is updated. Perhaps that is simply a logic error in g-d-u and it should just make a copy of the mount points list, and unmount each one only one time, then move on, without caring about the property being updated?

psusi avatar Oct 08 '18 13:10 psusi

Agreed this is a major problem for some use cases. UDisks has been built around udev and inherits its asynchronous nature. This is often a benefit for GUI use however as the displayed properties are updated asynchronously automatically just by binding the right signals. And that was historically the primary use case.

I'm little worried about guarantees of uevent delivery - in case udev is under high load, incoming events may take significant time to get delivered. If UDisks is supposed to be enterprise environment-aware, any arbitrary timeout poses a risk of missed update.

I'm not sure we're able to solve this issue entirely with the current udev design. The proposal of adding an optional settle timeout would certainly help in many cases but would never solve the root issue. Consequently waiting indefinitely poses a risk of blocking forever (given that the client D-Bus method call timeout is altered accordingly).

Users should also keep in mind the way D-Bus properties are retrieved and updated (e.g. on a proxy object) in their D-Bus client implementation of choice.

tbzatek avatar Nov 01 '18 17:11 tbzatek

Sorry for taking so long to respond, I've been a little busy with the birth of my third child.

Congratulations and all the best @psusi!

vpodzime avatar Nov 29 '18 13:11 vpodzime

I'm not sure we're able to solve this issue entirely with the current udev design.

I am sure we are not. :) But what I have been trying to suggest is to make things better within the boundaries of keeping the API stable. I think adding the timeout for the property updates will improve things even if it is not guaranteed to be all that needs to be done in all the cases. It has at least two benefits:

  1. The logic/knowledge of what will/should be updated is kept in UDisks.
  2. The busy-wait loops wouldn't be that much busy.

vpodzime avatar Nov 29 '18 13:11 vpodzime

Unmounting is broken again for GNOME Disks because it still sees the mountpoint in the filesystem property. There was a waiter added but then some state cleanup logic added and so on… Maybe there is a way to work around it in GNOME Disks by getting a fresher object state or waiting again, don't know. It's very easy to reproduce with loopback devices it seems (Create multiple partitions, make sure that all are mounted, and then click the minus button to remove to get the error message that something is already unmounted).

pothos avatar Mar 05 '20 22:03 pothos

By "again" you mean with the latest stable release, git master or when testing the linked pull request? There are many changes going on in this regard, the target state is that when Unmount() method call returns, all properties should reflect the actual state. That's what the synthetic uevent tagging work was generally about.

tbzatek avatar Mar 06 '20 10:03 tbzatek

Out of curiosity, the issue you've described - are you receiving the D-Bus property change notifications or not at all? Is there a considerable delay between such signals after a related method call returns?

The way UDisks API is supposed to be used in GUI is to connect signals to properties and objects and let the GUI react on events rather than doing explicit GUI update right after a relevant method call returns. Of course this approach fails when you serialize operations within a batch or a script and that's what we're focusing on right now but that doesn't seem to be the case you've described.

tbzatek avatar Mar 06 '20 11:03 tbzatek

Thanks for the answer, yes, I should have been more specific that it's about the versions currently shipped in Fedora 31 and Debian sid (2.8.4) and I didn't try out anything else.

Good point, the problem is that the DBus property change is propagated too slow to the application, and I thought that reading the property throught the client C library (udisks_filesystem_get_mount_points) would somehow make sure that the freshest state is fetched but that does not seem to be the case - I need to find out how exactly that works.

pothos avatar Mar 06 '20 20:03 pothos

@tbzatek I've tested again with your synth_uevent branch but it does not solve the problem. To test I created a GPT-partitioned loop device file with 10 ext4 partitions which were all mounted and then pressed the "-" button to detach the loop device. This automatically unmounts all filesystems until it sees that all mount points are gone. Because the property is not updated yet it tries to unmount a filesystem that already is unmounted which gives an error. So I guess until someone comes up with a clever way to fix this for udisks_filesystem_get_mount_points a waiter in GNOME Disks is needed - or the code is removed from GNOME Disks and gets a new home inside UDisks: EnsureIsUnused() to recursively unmount/lock children block devices of a block device.

pothos avatar Mar 06 '20 21:03 pothos

Thanks for testing the pull request. Just realized the org.freedesktop.UDisks2.Filesystem.Unmount() method doesn't trigger uevent explicitly. Please check the updated synth_uevent branch if it makes things better now.

If that doesn't help, we may need to check the D-Bus communication and processing of the messages, both on server and client side.

Can you point me to gnome-disks code where you call the Unmount() method please? I'm wondering how are D-Bus object properties handled there.

The libudisks2 library contains mostly convenient GDBus wrappers around the UDisks D-Bus interface, nothing special there. It's important however to create the UDisksFilesystem proxy within proper thread-default main context so that the main loop can receive D-Bus messages while the Unmount() method is being handled, i.e. called async. The GDBus worker thread should hopefully take care of serialization of incoming D-Bus messages itself.

tbzatek avatar Mar 09 '20 16:03 tbzatek

@tbzatek Great, that solves the problem! (The code is in https://gitlab.gnome.org/GNOME/gnome-disk-utility/-/blob/master/src/libgdu/gduutils.c#L1529)

pothos avatar Mar 09 '20 22:03 pothos

@pothos, thanks for testing, this is a useful finding! Looks like we'll have to use uevent trigger more often as we clearly can't rely on the kernel and differences in filesystem implementations.

Anyway, with the basic uevent tagging work has been merged, there's still some fine-tuning and testing work to do. The next step could be to get rid of the property update waits within the class DBusProperty tests. That would hopefully reveal more areas where improvement is needed.

tbzatek avatar Mar 13 '20 16:03 tbzatek