Process: kill then WAIT for child, in reverse. fixes #777
Children process don't always exit due to parent-child relationship or subreaper behavior.
Kill the children (in reverse) and WaitForChild() seems to allow processes to quit as expected and not be left lingering.
I've tested this on a few games in my library. My test cases:
-
Quit game normally through the menu: If game closes normally it seems everything is fine, all relevant processes are cleaned up and the green play button appears.
-
Mnimize game and quit from the desktop taskbar: If game closes then everything is ok again but often a game will freeze and gamescope will freeze as well. Then i have to kill gamescope and processes are left running.
-
Quit game via Steam's stop button: If i don't quit ingame but use Steam's stop button all processes quit but then it goes even further and Steam itself quits every time.
-
Test game with --mangoapp: With --mangoapp almost all my games either freeze or window closes but Steam still thinks game is running. This does not happen with regular gamescope
- 1 same
- 2 i think is actually the same without this patch, except steam doesn't realize the process is still running
- 3 does not kill steam for me. im running on top of gamescope 3.16.14
- 4
gamescope --mangoapp -- %command%looks like it kills for me in steam but the process is still running, similar to point 2 without the patch
only really tried mirrors edge
thanks ill try to look more when i get some time
K, I axed the recursive part of KillProcessTree() which I think is where the headache comes from. Was able to start and hotkey-kill MirrorsEdge without issues through steam (with and without --mangoapp), and manual wine runs from terminal.
For what it's worth --mangoapp still causes me trouble. After game quits i still have to use Steam's stop button(although this time Steam doesn't close). But now if i instead explicitly close gamescope's window Steam is killed. Without --mangoapp everything seems to be fine. Also tested with Mirror's Edge, this is the process list that remains:
37093 /bin/bash /usr/bin/gamescope -f --mangoapp -- /home/iod/.local/share/Steam/ubuntu12_32/steam-launch-wrapper -- /home/iod/.local/share/Steam/ubuntu12_32/reaper SteamLaunch AppId=17410 -- /home/iod/.local/share/Steam/steamapps/common/SteamLinuxRuntime_sniper/_v2-entry-point --verb=waitforexitandrun -- /home/iod/.local/share/Steam/steamapps/common/Proton 10.0/proton waitforexitandrun /home/iod/.local/share/Steam/steamapps/common/mirrors edge/Binaries/MirrorsEdge.exe
37095 gamescope -f --mangoapp -- /home/iod/.local/share/Steam/ubuntu12_32/steam-launch-wrapper -- /home/iod/.local/share/Steam/ubuntu12_32/reaper SteamLaunch AppId=17410 -- /home/iod/.local/share/Steam/steamapps/common/SteamLinuxRuntime_sniper/_v2-entry-point --verb=waitforexitandrun -- /home/iod/.local/share/Steam/steamapps/common/Proton 10.0/proton waitforexitandrun /home/iod/.local/share/Steam/steamapps/common/mirrors edge/Binaries/MirrorsEdge.exe
37130 gamescopereaper --respawn -- mangoapp
37132 mangoapp
Steam's stop button kills them all.
Went back to killing in reverse, made sure it shouldn't be off-by-1. Looks like no mangoapp or gamescope* to me?
Thank you! Without this fix, gamescope consistently hangs for me after quitting Mod Organizer 2, with this fix that never happens. Works with MangoHud for me too.
@zlice, sorry if I'm missing something, but it looks like you might also have to call KillProcessTree in a loop until all children are dead.
The reason why is that after a quick look, it seems like GetChildPids only returns immediate children. Gamescope is a sub-reaper, so when one of its child processes dies, they will be re-parented to Gamescope, but won't get killed since they only become immediate children of Gamescope after the call to GetChildPids.
Could you explain the reason for iterating in reverse? It seems like that might be related to what I'm describing here, but I also didn't look very closely so I could be completely wrong!
From what I remember/understood, parents should kill their children, but going forward from parent->children1->children2 would just kill the parent, maybe the 2nd layer of children, then leaves others running for some reason (depending on ??? what launched what? like mangohud launches another process? or I'm not sure exactly). But killing them in reverse will kill what mangohud launched (your game) and then mangohud, and then the parent appropriately.
The KillProcessTree is called from KillAllChildren which has a list of children from GetChildPids which I think was ordered and contains all children (from parent->children1->children2). This was called in some kind of loop but not seemingly working and leaving processes running because parent would die, maybe children1, and then well there's no children2 for children1 because you killed it and there's nothing left to find children for or something?
@zlice This made me interested, and I looked a little more closely. I'm pretty sure GetChildPids isn't getting grandchildren, etc. ("parent->children1->children2" as you say), so I think the recursive part has to stay (although leaving it out doesn't seem to cause problems often as most games aren't creating deeply nested processes).
I am able to reproduce the bug described (I run portal 1 in gamescope with --mangoapp and close the game by closing the gamescope window).
To try to get to the bottom of what's happening, I modified src/Apps/gamescopereaper.cpp to kill children with SIGKILL instead of SIGTERM and after this change, nothing was hanging. Also, gamescopereaper already appears to wait properly for all children to exit.
My current (possibly wrong) intuition for the problem is that under some (???) condition, many games don't exit properly and hang. gamescopereaper doesn't have any sort of timeout before it force kills children, so it will also hang waiting for them to exit. For some reason, iterating over the children in reverse order prevents the condition that causes games to hang from happening.
Possibly related: When I reproduced the issue, it seemed to coincide with closing gamescope before the game itself exits. It looks like at some point, gamescope was changed so that a separate "gamescopereaper" process is the sub-reaper instead of gamescope itself. I wonder if this is part of the problem.
In any case, I think that the recursive part of KillProcessTree needs to stay in order to correctly kill all children in all cases.
I thought the grandchildren were left hanging because the secondary process like mangohud was spawning a fork or subreaping on its own. Mirrors edge (steam) with gamescope and mangohud was doing this for me as well as an old solitare from windows.
It does seem like some special setup of game/tool or maybe even system/library but I'm not sure what it is either.
Not entirely sure what I'm "killing" as it's usually with a hotkey, so I assume it's gamescope. But the games like Mirrors Edge (steam) or solitare (wine) will seem to be gone and have no GUI but ps or htop shows they are still running, and require a kill.
I can try to put the ``KillProcessTree` back in the loop but I believe it crashes when doing things in reverse and wasn't working when going forward. Could have messed something up though. (Not sure when I can get around to it either btw/fyi)
games like Mirrors Edge (steam) or solitare (wine) will seem to be gone and have no GUI but
psorhtopshows they are still running, and require a kill.
This is what I found as well. When I changed the code to kill with SIGKILL instead of SIGTERM, the processes were killed correctly. Therefore, I think gamescopereaper is correctly trying to kill all the child processes, but there's something going on that can cause them to hang when killed with SIGTERM (which allows a process to do its own cleanup before exiting, SIGKILL just stops the process).
The gamescopereaper is already correctly waiting on child processes to exit, so I don't think an extra wait needs to be added to KillProcessTree.
Ah okay, gotchya. Interesting. That was the only change for that test then right? SIGKILL vs SIGTERM?
I think there are some other patches applied to the source tree I was messing with, but yes: before I made that single change I could reproduce the bug with the setup I described above and after it I could not.
I can confirm that the current patch fixes the 'gamescope doesn't exit' problem I've had persistently. This is built against 3.16.15.
This patch really should be merged into the next release. Any chance of progress on that front?
So how do we bring Valve's attention to this patch?
@misyltoad hi, possible to have feedback on this PR?
@misyltoad Please, we beg of you, this is an essential fix for using gamescope outside steam
... we 100% need a community fork of gamescope