dix: initialize all screens before proceeding to gc's, root windows etc
Adding asyncFlipPrivateKeyRec to modesetting has revealed one more problem. Current code walks along all screens and initializes screen resources, then gc's, stipples, root windows for each of them, hence after the first screen registering new private keys is no more possible. This crashes modesetting driver if it is not initialized before others.
Namely, after screen 0 is initialized, and root window for it is created, global_keys[PRIVATE_WINDOW].created becomes non-zero, hence no new private keys of this type can be registered, and modeset(1) fails at dixRegisterPrivatekey().
This patch makes screen resources for all screen initialize first, hence all necessary private keys (including of the type PRIVATE_WINDOW) are initialized before root windows are created. Tests show that it prevents occasional crashes on NVidia390 and constant crashes on NVidia 470, and makes no difference for NVidia 340 because it is not modesetting capable.
@ONykyf Are you sure the problem is recent? We were already registering other private keys before async flips.
Looking at https://github.com/X11Libre/xserver/commit/77830f228449ec34d69e746bbf3e5b2de374f513 , globals are 0-initialized regardless of if they are declared static or not.
Are you sure you are not hitting something else?
Other drivers like xf86-video-intel also register private keys, so this should affect more than modesetting.
Can you post the xorg.conf you used when you hit this?
@ONykyf Are you sure the problem is recent? We were already registering other private keys before async flips.
Yes because it's OK with xlibre-xserver-25.0.0.15, which has no async...
GDB debugging shows this.
Another thing that looks strange to me that xlibre-xserver-master initializes modeset(1) together with NVIDIA(0) for an NVidia card, while xlibre-xserver-25.0.0.15 does this only for NVIDIA(0), see log files stripped off time marks to simplify the comparison:
@ONykyf Support for server generations was recently dropped, and I suspect we did a bad job with it. Does this still happen on https://github.com/X11Libre/xserver/commit/434331e1290a0261647c22333c6f9f0843596c4b ?
@ONykyf Support for server generations was recently dropped, and I suspect we did a bad job with it. Does this still happen on 434331e ?
Now I have to go, will return at the evening and check this, if necessary.
Looks like there is an actual problem here. If I use the fbdev driver on the first screen, and the modesetting driver on the second screen, this is what happens on screen teardown:
waiting for X server to shut down X connection to :0 broken (explicit kill or server shutdown).
double free or corruption (!prev)
Backtrace:
0: X (xorg_backtrace+0x56) [0x5622e0b97a81]
1: X (0x5622e09ff000+0x19e1b2) [0x5622e0b9d1b2]
2: /lib64/libc.so.6 (0x7fd3d5c05000+0x3cbb0) [0x7fd3d5c41bb0]
3: /lib64/libc.so.6 (0x7fd3d5c05000+0x91eec) [0x7fd3d5c96eec]
4: /lib64/libc.so.6 (gsignal+0x12) [0x7fd3d5c41a82]
5: /lib64/libc.so.6 (abort+0x24) [0x7fd3d5c29f3b]
6: /lib64/libc.so.6 (0x7fd3d5c05000+0x260fd) [0x7fd3d5c2b0fd]
7: /lib64/libc.so.6 (0x7fd3d5c05000+0x9b8a7) [0x7fd3d5ca08a7]
8: /lib64/libc.so.6 (0x7fd3d5c05000+0x9d9ec) [0x7fd3d5ca29ec]
9: /lib64/libc.so.6 (0x7fd3d5c05000+0x9db10) [0x7fd3d5ca2b10]
10: /lib64/libc.so.6 (__libc_free+0x180) [0x7fd3d5ca5740]
11: X (_dixFreeObjectWithPrivates+0x30) [0x5622e0a9211b]
12: X (DeleteWindow+0x1d8) [0x5622e0aa85d7]
13: X (0x5622e09ff000+0x9d16d) [0x5622e0a9c16d]
14: X (FreeClientResources+0xcb) [0x5622e0a9cb14]
15: X (FreeAllResources+0x49) [0x5622e0a9cc0b]
16: X (0x5622e09ff000+0x6d4cb) [0x5622e0a6c4cb]
17: X (0x5622e09ff000+0x24551b) [0x5622e0c4451b]
18: /lib64/libc.so.6 (0x7fd3d5c05000+0x26faa) [0x7fd3d5c2bfaa]
19: /lib64/libc.so.6 (__libc_start_main+0x85) [0x7fd3d5c2c065]
20: X (_start+0x21) [0x5622e0a168e1]
Fatal server error:
Caught signal 6 (Aborted). Server aborting
This was a problem in the modesetting driver before, with the vrr private key. It didn't usually manifest because it is disabled by default, so the privates aren't initialized.
@metux Looking at the vrr privates, they are stored in the modesetting driver struct instead of being globals. Is this right?
Trying to debug this, when using current master, I get the error above when I don't enable vrr, but when I do enable vrr, the problem disappears.
ACK from me, patch looks good and fixes the issues I could reproduce on my side.
@metux What is the difference between gpu screens and regular screens here? Does the server split the screens into gpu and non-gpu based on hw acceleration or something?
How come this code works fine on Xorg?
@metux Looking at the vrr privates, they are stored in the modesetting driver struct instead of being globals. Is this right?
No, that's asking for trouble. We already had this discussion on some PR (don't recall exactly which one). the private key structs must always be global, because they can be accessed some time later, when driver structs already gone.
If one needs separate instances per screen, that's what dixRegisterScreenPrivateKey() et al are for.
EDIT: the scenario is: a driver needs to attach some private data to eg. a pixmap, but here one per screen (IOW: on n screens, this pixmap will have n different privates, one for each screen).
ACK from me, patch looks good and fixes the issues I could reproduce on my side.
@metux What is the difference between gpu screens and regular screens here? Does the server split the screens into gpu and non-gpu based on hw acceleration or something?
For multi-screen and offloading (eg. via prime). gpu-screens aren't directly visible (on the protocol), but are acting as slaves to the actual screens.
@metux ping