mamba icon indicating copy to clipboard operation
mamba copied to clipboard

Forward signals to child process with micromamba run

Open MihaiBSony opened this issue 1 year ago • 9 comments

Hi! This MR tries to fix issue #1820, in which signals are not forwarded to the child process. I think this should count as a first draft - in the final version it's probably a good idea to save the state of the signal handlers and restore it once the call is over, instead of just overwriting them. Looking forward to your feedback!

P.S.: I don't know much about Windows, so I didn't even try to handle that case.

MihaiBSony avatar Jan 24 '24 21:01 MihaiBSony

Do have more understanding of why signal forwarding works on Linux and macOS but not Windows? Should this patch be Windows-only instead?

jonashaag avatar Jan 25 '24 08:01 jonashaag

My changes don't touch windows at all (almost all changes are inside a #ifndef _WIN32 guard). Previously, the only signal that was somewhat working on Linux was SIGTERM, because micromamba was adding a signal handler for it (see removed code in the PR), which in turn was stopping the child process with a SIGTERM. This PR attempts to forward all signals to the child, except for SIGCHLD.

MihaiBSony avatar Jan 25 '24 08:01 MihaiBSony

The context where I encountered this as an issue is a setup involving SLURM and pytorch-lightning. There I needed the python interpreter to receive the SIGUSR1 signal, but, whenever I ran it as micromamba run -n env python script.py, sending SIGUSR1 to micromamba would just terminate the micromamba process and the child, since that is the default action for SIGUSR1 according to the signal(7) manpage.

While working on the issue I got some inspiration from the source code of the Singularity/Apptainer containerization tool, since micromamba run seems to roughly have a similar purpose as singularity exec. There they seem to handle signals in a similar way as this PR, forwarding them to the child process (if I understood the code correctly). See here.

MihaiBSony avatar Jan 25 '24 09:01 MihaiBSony

Thanks a lot for the context!

We definitely need to add tests for this feature. I guess it could be as simple as using micromamba run to start a Python script that prints whatever signals it receives, and using that output in the test.

That could also help understanding the Windows status quo :)

jonashaag avatar Jan 25 '24 09:01 jonashaag

Hi @jonashaag!

I made the code more robust and added unit tests for signal forwarding and process suspension (with CTRL-Z). Unfortunately the code is using a lot of Linux syscalls, so not even the unit tests are usable for windows. Could you take a look at it and maybe tag other people who have more knowledge of signal handling or linux processes? In particular, I'm not sure if the way I'm dealing with the previous signal handler and the signal mask is correct, and if I'm waiting correctly for the child process. Also, it might make sense to move some of the new code to separate functions to keep things tidy.

On the plus side, I already tested the binary in my setup and it works for my use case! 😀

MihaiBSony avatar Jan 31 '24 15:01 MihaiBSony

What signal handler do we have installed in this loop?

// Set handlers for all signals and save the previous state.

Ie. what are the previous handlers? If this always the fixed list from set_signal_handler? If so, I guess we could simplify the code by making it less generic?

jonashaag avatar Jan 31 '24 17:01 jonashaag

What signal handler do we have installed in this loop?

// Set handlers for all signals and save the previous state.

Ie. what are the previous handlers? If this always the fixed list from set_signal_handler? If so, I guess we could simplify the code by making it less generic?

When used in our code, I think only the so-called default handler is set. If, on the other hand, run_in_environment is used from third-party code (which could be the case, since it's in libmamba?), then we can't make any assumptions about the signals.

Also, there's actually one corner case which I don't handle and which seems impossible to handle without reworking other parts of the code. If someone uses set_signal_handler to set a non-default handler, I can't restore it at the end of the run_in_environment call, and my code just sets the default handler instead.

The so-called signal handlers from set_signal_handler are actually just threads blocking on a sigwait call. We could save the function that is passed as an argument, but if it's third-party code and has any side-effects before getting to sigwait, the behaviour would be pretty hard to predict. This seems to rather be a design flaw in the set_signal_handler functionality, but maybe there's a reason why we don't just use the normal linux signal handlers instead?

MihaiBSony avatar Feb 05 '24 09:02 MihaiBSony

It seems that set_signal_handler was introduced in https://github.com/mamba-org/mamba/pull/1193. Can you please check @MihaiBSony's last comment and let us know if there is a way to simplify the signal handling here? Especially for the run_in_environment API. @adriendelsalle @AntoinePrv

jonashaag avatar Feb 05 '24 12:02 jonashaag

I don't know much about the signal handlers but I know they must be forwarded to python when used in libmambapy.

AntoinePrv avatar Feb 05 '24 15:02 AntoinePrv