auto-mcs icon indicating copy to clipboard operation
auto-mcs copied to clipboard

[bug] psutil doesn't always kill unix-based processes (killing Java when the server stops hangs)

Open kokofixcomputers opened this issue 7 months ago • 42 comments

Describe the bug Using telepath on macos, pressing Command + Q shuts down the server but then it starts thinking it is in a deadlocked state

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

auto-mcs Configuration

  • Clarify if you're using Telepath YES
  • Clarify if you're using the GUI, headless, or Docker GUI

Operating System and platform (please complete the following information):

  • OS: host: windows 11, telepath client, macos sonoma
  • Architecture host: host: x64 telepath client: arm64]
OS Path
Windows %appdata%\.auto-mcs\Logs
macOS ~/Library/Application Support/auto-mcs/Logs
Linux ~/.auto-mcs/Logs

Expected behavior It properly shuts down the server

Screenshots

Image

Additional context Sorry, if i'm creating too many issues

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

This isn't a bug with auto-mcs, it's a big with fabric and whatever mods you're using. Fabric has an odd shutdown process, and you can close the server by pressing the skull to kill it, or press CMD-Q again. This does not happen in Vanilla, and isn't a bug with auto-mcs. It's simply added functionality that detects when the server should be closed, but the server doesn't close itself

macarooni-man avatar Apr 20 '25 16:04 macarooni-man

This isn't a bug with auto-mcs, it's a big with fabric and whatever mods you're using. Fabric has an odd shutdown process, and you can close the server by pressing the skull to kill it, or press CMD-Q again. This does not happen in Vanilla, and isn't a bug with auto-mcs. It's simply added functionality that detects when the server should be closed, but the server doesn't close itself

@macarooni-man I tried to press the skull and Command + Q nothing happens.

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

@kokofixcomputers please provide detailed reproduction steps including how you made the server, your OS version, and I'll reopen if I can reproduce it

macarooni-man avatar Apr 20 '25 16:04 macarooni-man

@kokofixcomputers please provide detailed reproduction steps including how you made the server, your OS version, and I'll reopen if I can reproduce it

@macarooni-man Thanks: OS: Windows Machine Host, Mac as telepath client Using Telepath: YES Server type: Fabric Reproduce steps:

  1. Have telepath setup
  2. Create a fabric server with the version 1.21.4
  3. Have the server running for a bit
  4. Go into the telepath client and press Command + Q
  5. Sometimes it works, sometimes it doesn't (most times it doesn't)

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

@kokofixcomputers does this happen locally as well? And what about on Vanilla over Telepath?

macarooni-man avatar Apr 20 '25 16:04 macarooni-man

@kokofixcomputers does this happen locally as well? And what about on Vanilla over Telepath?

I'm testing

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

@kokofixcomputers thank you!

macarooni-man avatar Apr 20 '25 16:04 macarooni-man

@kokofixcomputers thank you!

Np :)

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

Thanks for this awesome application!

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

Huh? Now it's stuck on Creating initial backup while the progress bar is green and says 100%

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

The problem is only with fabric @macarooni-man I will try a few more times to see if it works still. But the kill button i think never worked to me

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

9:01:26 AM [INIT] > 'Survival' has stopped successfully 9:01:28 AM [WARN] > 'Survival' is deadlocked, please kill it above to continue...

well it does say has stopped successfully for some reason

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

@macarooni-man The only way to resolve this is to force quit the host???

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

@kokofixcomputers, I guess what I'm asking is this:

  1. Does this happen in Vanilla over Telepath?
  2. Does this happen when running Fabric locally?

macarooni-man avatar Apr 20 '25 16:04 macarooni-man

@kokofixcomputers, I guess what I'm asking is this:

  1. Does this happen in Vanilla over Telepath? NO
  2. Does this happen when running Fabric locally? YES

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

@macarooni-man I asked AI to see why killing is not working: 2. Process Name Matching Might Be Insufficient

Your code checks for java.exe (Windows) or java (Linux/macOS) by name. However, sometimes the Minecraft server may run as javaw.exe on Windows, which your code does not account for.

If the process name differs, your script will not find and kill the correct process.

Hopefully this helps you solve this problem!

Thanks for making and maintaining a awesome app!

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

thanks @kokofixcomputers!

actually, auto-mcs has complete control over the Java wrapper, and it launches specifically from an internally managed Java environment. It's only an issue with Fabric, and due to it only being an issue on macOS and Linux, my guess is that it's caused by the kill command sending the wrong termination code to the process. I need to force a SIGTERM, and I'm using psutil for that. It's possible there is a better solution but I'd have to look into this more

macarooni-man avatar Apr 20 '25 16:04 macarooni-man

thanks @kokofixcomputers!

actually, auto-mcs has complete control over the Java wrapper, and it launches specifically from an internally managed Java environment. It's only an issue with Fabric, and due to it only being an issue on macOS and Linux, my guess is that it's caused by the kill command sending the wrong termination code to the process. I need to force a SIGTERM, and I'm using psutil for that. It's possible there is a better solution but I'd have to look into this more

Hmm... I think the issue also occurs on windows

kokofixcomputers avatar Apr 20 '25 16:04 kokofixcomputers

@kokofixcomputers can you send an .amb backup of your server? I'm unable to reproduce this on a stock Fabric server

macarooni-man avatar Apr 20 '25 17:04 macarooni-man

@kokofixcomputers can you send an .amb backup of your server? I'm unable to reproduce this on a stock Fabric server

It takes a while. I think it happens after letting the server run for like 30 min. So, take your time. No rush. Can i send you the backup file through email? Github doesn't accept the file format.

kokofixcomputers avatar Apr 20 '25 17:04 kokofixcomputers

@kokofixcomputers can you send an .amb backup of your server? I'm unable to reproduce this on a stock Fabric server

It takes a while. I think it happens after letting the server run for like 30 min. So, take your time. No rush. Can i send you the backup file through email? Github doesn't accept the file format.

It's too large for Github, not the file format. You can upload it to a file sharing service and send the link, but yeah if it only happens after 30 minutes that's not something I wish to spend my time troubleshooting, I hope you can understand lol. If you can find another way to make it happen immediately I'm happy to spend my time troubleshooting

macarooni-man avatar Apr 20 '25 17:04 macarooni-man

@macarooni-man Hey! So I made a bunch of other tests, and apparently, the kill button is already bugged. It won't kill Vanilla either. So maybe the issue with fabric not being able to be killed stems from something that is per the whole AutoMCS handling and not just Fabric handling?

I tried digging into AutoMCS's code but I still don't see how menu.py attempts to kill it.

Thanks!

kokofixcomputers avatar Apr 21 '25 14:04 kokofixcomputers

@kokofixcomputers can you provide any other information about the setup? What OS is this on, and more importantly, is it over Telepath?

macarooni-man avatar Apr 21 '25 14:04 macarooni-man

@kokofixcomputers can you provide any other information about the setup? What OS is this on, and more importantly, is it over Telepath?

I tried both with and without telepath it both have a broken kill button. I only tried windows as the host. but macos as the telepath client.

kokofixcomputers avatar Apr 21 '25 14:04 kokofixcomputers

@macarooni-man And apparently, when using the kill button, java is killed successfully. But it didn't kill the console window host and others

Image

Some of these didn't exist before starting the server And maybe it didn't register it as killed just because other processes are still running?

kokofixcomputers avatar Apr 21 '25 14:04 kokofixcomputers

@kokofixcomputers It's impossible to narrow this down without detailed information or a server backup. I'm unable to reproduce this issue, and it leads me to believe it might be a conflict with how the subprocess module interacts on your system. Can you please try the following cases and additionally send me the .amb file of that fabric server?

Additionally, are you using the release binary, or the fork you created? Because disabling recursive processes like I suggested yesterday would lead to this exact behavior

  • Windows Vanilla (no Telepath)
  • Windows Fabric (no Telepath)

And try these cases on another system if possible

macarooni-man avatar Apr 21 '25 14:04 macarooni-man

@kokofixcomputers It's impossible to narrow this down without detailed information or a server backup. I'm unable to reproduce this issue, and it leads me to believe it might be a conflict with how the subprocess module interacts on your system. Can you please try the following cases and additionally send me the .amb file of that fabric server?

Additionally, are you using the release binary, or the fork you created? Because disabling recursive processes like I suggested yesterday would lead to this exact behavior

  • Windows Vanilla (no Telepath)
  • Windows Fabric (no Telepath)

And try these cases on another system if possible

  • Windows Vanilla (no Telepath) kill button doesn't work
  • Windows Fabric (no Telepath) kill button doesn't work
  • MacOS Sonoma Vanilla (no Telepath) kill button doesn't work @macarooni-man This is what i got

If you want the file to the Fabric server: https://mega.nz/file/oxgiWJoT#gOou6byphwtckbM1Cwo4QE6biQeEUjJ0Yu43bIlEGbo

kokofixcomputers avatar Apr 21 '25 15:04 kokofixcomputers

I caught something during testing:

Image

The timer is still going up?!

and:

Image

The ip is still there? But it thinks its off?

Well... it says deaklocked from a telepath client so at least this is getting somewhere

kokofixcomputers avatar Apr 21 '25 15:04 kokofixcomputers

@kokofixcomputers It's impossible to narrow this down without detailed information or a server backup. I'm unable to reproduce this issue, and it leads me to believe it might be a conflict with how the subprocess module interacts on your system. Can you please try the following cases and additionally send me the .amb file of that fabric server?

Additionally, are you using the release binary, or the fork you created? Because disabling recursive processes like I suggested yesterday would lead to this exact behavior

  • Windows Vanilla (no Telepath)
  • Windows Fabric (no Telepath)

And try these cases on another system if possible

I am using the release binary. Not the fork

kokofixcomputers avatar Apr 21 '25 15:04 kokofixcomputers

There are two issues in particular, and in this case @kokofixcomputers is experiencing both:

  • Fabric, with certain mods, doesn't actually close the server when stopped. It gets hung on a background thread that doesn't seem to close on Windows utilizing the taskkill /f command

  • On Telepath, sometimes the client's panel doesn't reset after the server stops. Consider looking into the ConsolePanel.reset_panel() method

macarooni-man avatar Apr 21 '25 22:04 macarooni-man