wine [Question] Plans for Mainlining

Are there are plans for mainlining these patches upstream in the Wine project itself, or maybe better, the Wine-staging patches with a respective esync toggle? Or will these patches always remain a separate patchset? Thanks :)

Aug 22 '18 09:08 Mushoz

There are some aspects of the design of these patches (namely, use of shared memory) and of their functionality (namely, that PulseEvent() and wait-all can't quite work right) that won't be acceptable upstream. As such I am afraid the patchset will always have to remain out-of-tree.

Aug 22 '18 15:08 zfigura

In that case, any chance of integrating it into Staging?

I noticed there are some conflicts with the current staging patches so it doesn't apply cleanly right now (making it somewhat difficult for users to apply) but they're mainly just due to things being moved around slightly rather than anything more serious.

Aug 23 '18 09:08 Enverex

I'd prefer it to be usable without staging. But if it can be made compatible with staging, it can benefit staging users.

Sep 03 '18 03:09 shmerl

It's been done already : https://github.com/Tk-Glitch/PKGBUILDS/tree/master/wine-tkg-git Useable with staging and/or mainline. It's a pkgbuild, but the needed patches and process is there for anyone who'd want to build that on a non-arch distro.

Sep 05 '18 05:09 Tk-Glitch

Any chance these could be put in Staging under a toggle? Even if they'll never hit mainline it would be nice to be able to use this without another PPA.

Nov 06 '18 00:11 qwertychouskie

Now that staging accepted esync, does it mean it was refactored and is potentially more suitable for upstream?

Apr 14 '19 14:04 shmerl

Now that staging accepted esync, does it mean it was refactored and is potentially more suitable for upstream?

No, it does not.

Apr 14 '19 15:04 zfigura

Ah, that's unfortunate. But at least it's easier to apply the patches now, which is a plus. Is there a way to ever make it upstreamable?

Apr 14 '19 16:04 shmerl

This keeps coming up, so I'll try to summarize here what makes it difficult, and what it would take to make it acceptable upstream. Summary: it is difficult. Some of this is covered in the README already, but not all.

PulseEvent() is wrong. This affects a fair number of applications already. We approximate PulseEvent() by quickly setting and resetting the event. For manual-reset events it's... kind of wrong, in that it leaves the event signaled for nonzero time, though I'd argue that honestly it's not really wrong in a meaningful way. For auto-reset events, however, it's definitely wrong. The gap between wakeup and read can race with the gap between set and reset, so that applications can miss the wakeup. This is what already breaks things. This is hard to fix on the Wine side; I think there are ways I can make it better by tracking internal state but I don't think there are ways I can make it completely correct, and in any case tracking internal state means shared memory which is a problem (see below). This is also... maybe easy to fix on the kernel side? The polling infrastructure, as I understand it, doesn't really allow for "wake up this fd but don't leave it signaled". But it seems possible to sort of bypass the polling infrastructure and do it directly. I think. So this could be added on the kernel side, presumably through some ioctl. I don't know if the kernel would accept this, however.
WaitForMultipleObjects(..., TRUE, ...), aka "wait-all", i.e. waiting on all of a set of handles as opposed to any of them, is wrong. We emulate it by waiting on each object in sequence, then trying to grab all of them in a tight loop. This might fail, and we account for that by releasing objects if we can't grab one of them immediately. This tends to work for applications, and I don't see any immediate reason why it should fail. The way in which it's wrong means that we might grab an object temporarily and then release it, which should never cause a deadlock. It is, however, still wrong. This is essentially impossible to fix on the user side. We would need kernel support for "wait for all". But this seems also nontrivial to fix on the kernel side; poll() really isn't built that way. It may not be difficult to adapt it that way, but it would need a new syscall at least. Again, I don't know how friendly the kernel would be to this modification.
Reading of previous state is not really atomic. (Though it's not clear from the documentation that it's required to be?) This is minor for obvious reasons, especially since almost nobody uses it. Pretty much impossible to fix on the user side, fixing on the kernel side would be pretty easy using some buffered ioctl.
Shared memory. Letting processes share memory is fundamentally not safe. We do not, at all, want to give one process the ability to break any others, by intent or accident (that it hasn't explicitly already given itself the permission to do). This is especially true if one of those processes is the wineserver (though the current state of things doesn't actually have the wineserver reading from shared state, which makes things a little better.) We use shared memory to track internal state of objects so that we can return them in NtQuery*() methods (and as out parameters to NtSetEvent(), ReleaseSemaphore() et al.) We also use them to optimize events a bit, so that we can avoid syscalls. To be acceptable for upstream, we need to do these things without shared memory. This is pretty much impossible to fix on the user side. On the kernel side reporting state would be rather easy, and I suspect it would also be accepted upstream. Of course, we would not be able to optimize events in the same way. I don't remember how much that optimization actually bought us, mind. But I think it was likely nonnegligible. On the other hand, if that optimization is the only thing that stays out of tree, well, I'd be okay with that.

Here are some entirely different approaches to the whole problem of synchronization:

a. A major constraint is the fact that we need to wait on several objects simultaneously. One alternative path that holds some initial promise is to act like a scheduler, where each thread waits on one object, and SetEvent() etc. does the work of traversing a wait queue associated with that object. But that needs some way of accessing objects that live in other processes. I don't see a way to do this with our constraints.

b. One of the earliest approaches that was proposed by Alexandre Julliard himself was to have objects which are only ever used from inside one process be optimized, and other objects be full server objects. Since performance-critical paths should never involve synchronization between processes, this would be great. It would be especially great since the fact that we're limited to one process means we can make use of things that wouldn't otherwise be available to us. The best paths here seem to me like: either using futexes instead of eventfds, and implementing support in the kernel for multiple futexes, or using futexes + an in-process scheduler (and no kernel modifications). What makes this incredibly, incredibly difficult, and maybe even impossible, is DuplicateHandle(). This means we have to have a way to graduate in-process objects to server objects, and the fact that DuplicateHandle() can be called at any time, including when threads are waiting on the object, and from any process (which may in fact be neither the process that created the object nor the process that will receive the duplicated handle) means that problematic races are very very very difficult to avoid. I'm still not sure this approach is impossible, but I've spent a long time thinking about it and it's at least very hard.

c. Locking. Extensive locking of objects. Problem is, to prevent some races we would have to lock the object between processes. That means implementing some sort of locking support in the eventfds themselves (or, alternatively, using one or more eventfds to implement global or per-object locks). This would be hugely expensive (though frankly, I'm not sure that even in-process locking using futexes or pure spinlocks would be great either). It might outperform wineserver, but I'm not even too sure of that. When developing the eventfd patches we were at the point of counting and trying to get rid of individual syscalls, which is why Proton optimizes the clock_gettime() calls in ntdll so that they can go through the vdso, and why I added the manual-reset event optimization mentioned above. I'm also not even sure that locking would solve all of our problems.

d. Using our own kernel objects. This would be sort of equivalent to making all of the modifications to eventfd mentioned above, except we wouldn't have to convince the Linux kernel to accept it. However, it means that we have to maintain it. I don't think Alexandre would be very happy about that, and I don't blame him. He might be slightly happier about it if I maintain it and Wine uses it as a separate library, but even then he may not be very happy about it. It's also difficult because it means that users would have to install an extra kernel module.

e. Optimizing wineserver. It's possible that the bottleneck isn't the pipe I/O calls themselves (which is 2 syscalls, anyway, so not that bad compared even to eventfd), but rather the fact that all wineserver operations are serialized, so they get stuck in a queue and thus delayed. But this is only a conjecture. It would also mean that we'd want to make wineserver multi-threaded. This would be a pain and I'm not even sure Alexandre would accept it.

I may have forgotten something but that's all I can remember at the moment.

Apr 14 '19 21:04 zfigura

Thanks for very detailed overview of the current situation!

It would also mean that we'd want to make wineserver multi-threaded. This would be a pain and I'm not even sure Alexandre would accept it.

Would using lock-less multithreading be a benefit for wineserver? There are very powerful libraries for that, like crossbeam: https://github.com/crossbeam-rs/crossbeam

I feel like doing all that in C is a very major pain indeed. Using Rust could solve such issues with using a way more advanced systems programming language more suited for such things.

Apr 14 '19 21:04 shmerl

Would using lock-less multithreading be a benefit for wineserver? There are very powerful libraries for that, like crossbeam: https://github.com/crossbeam-rs/crossbeam

Some things could potentially be lock-free, but I doubt a library would make that any easier.

I feel like doing all that in C is a very major pain indeed. Using Rust could solve such issues with using a way more advanced systems programming language more suited for such things.

Regardless of the relative virtues of the languages, rewriting parts of Wine in a different language sounds extremely unlikely to happen. I'd also submit that the main difficulty with concurrency is ensuring correctness while maintaining optimal performance; I doubt that a different language is going to help with that.

Apr 14 '19 22:04 zfigura

I doubt that a different language is going to help with that.

That's exactly what Rust enables. High performance with "fearless" concurrency. You can read various articles on this, it's a bit off-topic in this thread. There are some quite interesting posts on this from Mozilla about how they parallelized Servo in ways that would have been extremely difficult and costly to do in languages like C or C++, only because they used Rust.

Apr 14 '19 23:04 shmerl

wine wine copied to clipboard

[Question] Plans for Mainlining

wine
wine copied to clipboard