xdg-dbus-proxy icon indicating copy to clipboard operation
xdg-dbus-proxy copied to clipboard

Revisit requirement on monotone increasing client serials

Open A6GibKm opened this issue 2 years ago • 29 comments

https://github.com/flatpak/xdg-dbus-proxy/blob/b15c7b26d6d4dd4c947b1a2c836f598b75cbb5fe/flatpak-proxy.c#L2184-L2190 requires series to be monotonically increasing, for a client it might be impossible to ensure the serials are strictly monotonically increasing if sent from different threads.

A6GibKm avatar Jan 19 '23 16:01 A6GibKm

I agree with @A6GibKm here. This check/requirement is just wrong. The D-Bus spec says absolutely nothing about the serial number's order. They don't have to be increasing even, let alone monotonically increasing. I think the name is confusing. It's just a cookie to map method calls with their corresponding replies.

zeenix avatar Jan 19 '23 16:01 zeenix

By the way, when using decoder , closing and opening the camera has a certain probability to kill the portal altogether until the app is closed (with this error being displayed), so it might be breaking more apps.

A6GibKm avatar Jan 19 '23 16:01 A6GibKm

The proxy code relies on the serials to be monotonic, because we sometimes need to send our own messages from the proxy, and when we do this we bump last_serial and serial_offset, and then later we apply the serial_offset to new messages we get to make up for this.

For example, we may get from the client

  • message 1, set last_message=1
  • proxy sends its own message, we last_serial is 1, so we use serial 2, but set serial_offset to 1.
  • message 2 arrives, but we rewrite it to message 3 via adding serial_offset so things will not be confused by two id 2.

However, this kind of rewriting breaks if the client can send reordered messages:

  • message 2, set last_message = 2
  • proxy sends fake message with id 3, serial_offset=1
  • message 1 arrives, we bump its id by serial_offset, so now it is message 2

Now we have two message 2, which is going to be very confusing.

alexlarsson avatar Jun 07 '23 17:06 alexlarsson

A real fix for this would have to have a much more sophisticated tracking of which ids need to be rewritten and how much,

alexlarsson avatar Jun 07 '23 17:06 alexlarsson

One idea could be to do some systematic reencoding - e.g. use odd serials for all things passed through, and reserve all even serials for our own messages

Something like:

if (message is fake) {
   message.serial = next_serial;
   next_serial += 2;
}  else {
  message.serial = message.serial * 2 + 1;
}

Would that work?

matthiasclasen avatar Jun 07 '23 17:06 matthiasclasen

The proxy code relies on the serials to be monotonic, because

There are many valid use cases for having a monotonically increasing serial number on the messages but just because we need them, doesn't mean we solve it by assuming a cookie to be what we need it to be. :wink:

FWIW we need ordering of messages in zbus too and we had to even create a new library for ensuring order of messages where needed. Unfortunately our solution won't work for your use case, as you need the serials encoded on the messages on the bus.

zeenix avatar Jun 07 '23 18:06 zeenix

@zeenix You're not wrong, but what I mean is that we can't just delete this check.

@matthiasclasen Yeah, that may work. It's equivalent to using the low bit for "fake". Alternatively we could use the high bit. Or we could start fake serials from MAXINT and count down.

I guess the main risk is that such transformations would exhaust the serial space. For example, message.serial * 2 may overflow if message.serial is large. In practice though, i think all dbus libraries start small and increment, so it should be fairly risk-free.

alexlarsson avatar Jun 08 '23 07:06 alexlarsson

@zeenix You're not wrong, but what I mean is that we can't just delete this check.

Gotcha.

I recently had quite a lot of trouble in busd code because the bus itself also has to provide the freedesktop interfaces and differentiating between own messages and others ended up with a lot of messy code. In the end, I decided to create a self-dial connection in the same process that provides those interfaces (i-e bus itself is also a client). This made things much simpler and eliminated the need for me to treat bus' own messages differently than that of other peers/clients.

I don't know if something like that would work for you but if it can, that would be a much less hacky solution than what you currently have and what you're thinking about IMO.

I guess the main risk is that such transformations would exhaust the serial space. For example, message.serial * 2 may overflow if message.serial is large. In practice though, i think all dbus libraries start small and increment, so it should be fairly risk-free.

All current libraries maybe and we can find that out to be sure but tomorrow someone can write a new library (e.g for another fancy new language) and they'll be within their right to (for example) start serial numbers from MAXUINT32 or choose some other scheme that will break proxy again.

The issue is based on your assumptions about behavior of D-Bus libraries so I'll suggest not to go for a solution that is again based on assumptions. Having said that, personally I only care about zbus here and as long as proxy doesn't break against zbus, I don't particularly care how you solve this.

zeenix avatar Jun 08 '23 11:06 zeenix

All current libraries maybe and we can find that out to be sure but tomorrow someone can write a new library (e.g for another fancy new language) and they'll be within their right to (for example) start serial numbers from MAXUINT32 or choose some other scheme that will break proxy again.

It doesn't matter where you start. There's enough even and odd numbers in a 32bit integer to go around. Of course you'll need to handle overflow, but the principle stays the same

matthiasclasen avatar Jun 08 '23 13:06 matthiasclasen

I think it is likely that any dbus library implementation would use some kind of at least semi-continuos serial counter, rather than completely random. If nothing else that is the easiest way to avoid reuse. So, if we believe that a even/odd split or something like that is gonna be a problem then we can have a more precise remap tracking (say at the level of 128k serials chunks). However, in practice I think using the high-bit for internal serials is probably best. And, you don't even have to change the client message serials in that case (assuming high-bit==0 means non-fake.)

alexlarsson avatar Jun 09 '23 07:06 alexlarsson

And, you don't even have to change the client message serials in that case (assuming high-bit==0 means non-fake.)

True, that is a nice point.

So instead of asserting that the serials are monotonic, we'll have to assert that the high bit isn't set.

matthiasclasen avatar Jun 09 '23 12:06 matthiasclasen

We might not even need the whole upper half of the serial space. Could just assert that its not in say to top 128k serials, and then reserve those for internal use.

alexlarsson avatar Jun 15 '23 14:06 alexlarsson

Small heads up that this is now one of the last big issues blocking wider camera portal adoption (i.e. enabling it in Firefox and Chromium by default).

Edit: as in: I'd be super happy if this was solved by one of you - but will of course look into it myself if nobody else finds time/motivation :)

rmader avatar Jul 21 '23 15:07 rmader

Edit: as in: I'd be super happy if this was solved by one of you - but will of course look into it myself if nobody else finds time/motivation :)

I think this is the case unfortunately. Be the hero Robert. :)

zeenix avatar Sep 04 '23 11:09 zeenix

Just a heads-up, https://github.com/dbus2/zbus/pull/488/commits/2e0a8902decf9ac278803f5c7622b740b37829c4 (i-e which will be included in zbus 4.0) will probably make this issue even worse.

I'm really sorry for an inconvenience that may be caused by this change but honestly, zbus is still very much compliant with the spec here.

zeenix avatar Oct 07 '23 21:10 zeenix

Can this issue be linked to my bug? https://github.com/flatpak/flatpak/issues/5742 I am not an expert in D-Bus, but my app uses D-Bus heavily and it also activates new D-Bus clients. In my case only in ~30% of cases it works, otherwise I am getting Invalid client serial.

savely-krasovsky avatar Mar 25 '24 21:03 savely-krasovsky

Almost certainly yes. Notably, you're using godbus, which I did not realize existed. So now we have two independent D-Bus implementations that both trigger this bug. That kinda ends the argument that this should be fixed in zbus rather than in xdg-dbus-proxy.

mcatanzaro avatar Mar 25 '24 21:03 mcatanzaro

Can someone explain how someone could workaround it? Even if it will require D-Bus library patch (zbus, godbus, etc.). From what I understood, I need to synchronize spawning new D-Bus clients so their client serials will increase monotonically.

In my case I would suggest that without patching openvpn3-linux (which I am trying to activate by calling org.freedesktop.DBus.StartServiceByName) I will almost always get this bug if openvpn3 D-Bus services are not alive.

savely-krasovsky avatar Mar 25 '24 22:03 savely-krasovsky

That kinda ends the argument that this should be fixed in zbus rather than in xdg-dbus-proxy.

That was never a good argument anyway since the bug is not in zbus at all. :) An argument could have been made that perhaps it would be easier to fix it in zbus but that's not the case either. Otherwise, I'd have "fixed" it a long time ago.

zeenix avatar Mar 25 '24 22:03 zeenix

From what I understood, I need to synchronize spawning new D-Bus clients so their client serials will increase monotonically.

I suspect the serials are associated with messages, not clients. To workaround, you'd have to change how godbus computes client serials. It's probably easier to just fix the bug. Matthias proposes a scheme in https://github.com/flatpak/xdg-dbus-proxy/issues/46#issuecomment-1581271590.

mcatanzaro avatar Mar 25 '24 22:03 mcatanzaro

Well, godbus seems to generate serials "properly", it uses mutual lock to do it: https://github.com/godbus/dbus/blob/76236955d466b078d82dcb16b7cf1dcf40ac25df/conn.go#L834 But from what I understand, if flatpak app talks to "broken" D-Bus client (e. g. openvpn3-linux uses GDBus++) it will be also affected?

savely-krasovsky avatar Mar 25 '24 22:03 savely-krasovsky

Well, go dbus seems to generate serials "properly", it uses mutual lock to to it: https://github.com/godbus/dbus/blob/76236955d466b078d82dcb16b7cf1dcf40ac25df/conn.go#L834

This may not be enough. I think most (likely all) D-Bus libraries do that, including zbus. The issue is the sending bit: if there are multiple threads sending messages, thread B could end up sending its message with serial 2 before thread A gets to send its message with serial 1 first, even though thread A managed to get its message assigned its serial number before thread B.

zeenix avatar Mar 25 '24 23:03 zeenix

Well, I am surprised then how does this bug doesn't have a bigger impact. Any app which heavily uses D-Bus should meet this issue sooner or later.

savely-krasovsky avatar Mar 25 '24 23:03 savely-krasovsky

Well, I am surprised then how does this bug doesn't have a bigger impact. Any app which heavily uses D-Bus should meet this issue sooner or later.

Not all applications/APIs are multi-threaded. You probably won't have this issue with zbus either if you use it with a single-threaded Rust async runtime. Also, some APIs could be assigning the serial number just before sending off the message, while maintaining a lock on the socket.


In any case, I don't see much point in continuing this discussion. We know the issue and there is a proposal for a fix. It's mostly about someone implementing it now.

zeenix avatar Mar 25 '24 23:03 zeenix

Just FYI, in SailfishOS there's this patch: https://github.com/sailfishos/xdg-dbus-proxy/blob/master/rpm/0003-Use-hash-table-for-mapping-reply-serials-between-con.patch

No idea why they never contributed it upstream, but it might be related.

mardy avatar Mar 26 '24 13:03 mardy

Wow, it's unusual to see such a complex patch kept downstream. Good find.

mcatanzaro avatar Mar 26 '24 13:03 mcatanzaro

No idea why they never contributed it upstream, but it might be related.

Something like: Because of reasons I needed https://github.com/sailfishos/xdg-dbus-proxy/blob/master/rpm/0002-Add-D-Bus-interface-for-querying-client-process-deta.patch - which did not work because (iirc etc) bus-serial - client-serial mapping in proxy using "serial offset between connections" was unable to handle bus to proxy method calls that are answered without involving round trip to client behind proxy (which definitely makes the serial mapping to be non-linear, but there might also be other reasons why that could happen) --> the "use hash table" -patch. It is generic, and might help with other reply-serial related issues but I'm not sure. And suitably calm "now, lets upstream things" -moment never came.

spiiroin avatar Mar 27 '24 06:03 spiiroin

I'm not able to understand that patch from the code. Is the idea to remap client serials that have already been used by the proxy and are used by the client again and store that mapping? If so, that's probably the most elegant solution.

However, @matthiasclasen's suggestion to split serials into odd/even seems like the simplest and most robust solution to me. It makes it impossible to have collisions and avoids any kind of race conditions or clients starting with high serials. It also sounds pretty easy to implement.

sophie-h avatar Apr 07 '24 12:04 sophie-h

Please ignore my last comment. I had some bug in my thoughts.

I have implemented the lower and higher bit solution. I'm currently running it on my system. Review would be appreciated.

sophie-h avatar Apr 12 '24 12:04 sophie-h

:tada:

zeenix avatar May 07 '24 12:05 zeenix