Possibility of multiprocess-safe logging without inheriting the logger?
I have had problems with colorized output on multiprocessing with windows and I did find the solution here: https://github.com/Delgan/loguru/issues/108
Unfortunately in my case the creation of the multiprocessing-pool is within a library.
Now I can solve the problem by patching the library (I probably will) but I wonder if a solution without touching initargs is possible, for the following reason:
In 2019 when the issue linked above was raised the newest stable python-version was 3.7 and this problem only affected windows.
But with python3.8, macOS switched the default to spawn and in python3.14, even linux will switch the default to spawn, thus eventually making default multiprocessing consistent across platforms. https://docs.python.org/3.12/library/multiprocessing.html
Here is an idea, no idea if it is feasible with the library-internals of loguru: multiprocessing-processes know their parent-process and can obtain its pid (even if they were spawned instead of forked) This unnecessarily wordy article shows it: https://superfastpython.com/multiprocessing-parent/ (This is a python>=3.8 feature, but since 3.7 is EOL I think that's ok) So if the main process expects more processes to create their own global loguru-loggers, then it would be theoretically possible for it to set up a synchronization-scheme in advance that the other processes will discover and hook into when those processes initialize their global loguru-logger.
Hi @julian-goettingen.
I agree that combining loguru and multiprocessing with "spawn" start method is not very convenient as of today. I was planning some kind of new logger.connect() method to permit inter-process communication without inheriting the logger. The idea would be to communicate using a TCP socket, message sent by child process would be received and logged by the main one.
That would require the method to be called by both parent and child processes (with separate arguments to differentiate sender from receiver), which is not as straightforward as when the "fork" start method is used.
However, it's important to note that inheriting the logger should be preferred whenever possible, as it provides better performance than any other solution. I also plan to add a new logger.reinstall() method so that one could just start a process with initializer=logger.reinstall to make sure the inherited logger is globally available.
Regarding your proposal of develop a protocol allowing child process to discover the parent logger, I have not a clear idea about how to implement this in a robust way. I'm open to technical suggestions if you have any.
About the discovery-mechanism:
All three big operating systems have named pipes, so I thought something should be possible with that. The __main__-process could spawn a process that will be the sole true writer doing disk-IO. This process reads on a named pipe called <__main__-proc_id>-new . If any process initialiizes their loguru-logger, they can find this name of this <__main__-proc_id>-new -pipe and send a a signal to the writer process notifying it of their existence. Then, the two processes could create a new pipe, <__main__-proc_id>-<child_proc_id> or something, where the child could write its messages.
Not sure how this would play out with multithreading, I don't use that at all and I don't know how loguru currently handles it, I am sure you have some complexity in place for that.
About performance: Why would inheriting the logger yield a better performance than a TCP protocol? My understanding is that loguru uses multiprocessing-queues and a multiprocessing-queue is just a pipe where we send pickled data. This article seems to suggest TCP can keep up with pipes: https://www.baeldung.com/linux/ipc-performance-comparison Is there something on the python side of things that I am not seeing?
Thanks for your input.
All three big operating systems have named pipes, so I thought something should be possible with that. The
__main__-process could spawn a process that will be the sole true writer doing disk-IO. This process reads on a named pipe called<__main__-proc_id>-new. If any process initialiizes their loguru-logger, they can find this name of this<__main__-proc_id>-new-pipe and send a a signal to the writer process notifying it of their existence. Then, the two processes could create a new pipe,<__main__-proc_id>-<child_proc_id>or something, where the child could write its messages.
I think named pipes support in Python is very OS-dependent, for example os.mkfifo() only works on Linux. The mechanism you describe might work but it seems to assume the child process always acts as a "client". There may be use case for starting the logger "server" from a child process. This is the kind of edge cases I find difficult to address without requiring the user to explicitly states the intent.
If inter-process communication without logger inheritance is needed by the user, I imagine something like that in main.py:
if multiprocessing.parent_process() is None:
logger.connect(is_server=True) # Main process: will receive and log all messages
else:
logger.connect(is_server=False) # Child process: all logs are sent to the server
We can imagine doing this automatically if is_sever is None, but that would still require to explicitly call logger.connect() in both child and parent.
Not sure how this would play out with multithreading, I don't use that at all and I don't know how loguru currently handles it, I am sure you have some complexity in place for that.
Yeah, that's likely not a big problem.
About performance: Why would inheriting the logger yield a better performance than a TCP protocol? My understanding is that loguru uses multiprocessing-queues and a multiprocessing-queue is just a pipe where we send pickled data. This article seems to suggest TCP can keep up with pipes: https://www.baeldung.com/linux/ipc-performance-comparison Is there something on the python side of things that I am not seeing?
Yes you're right, Loguru uses a pipe internally when enqueue=True and the logger is inherited.
The Python documentation seems to recommend pipe over socket for performance:
If you need fast IPC between two processes on one machine, you should look into pipes or shared memory. If you do decide to use AF_INET sockets, bind the “server” socket to
localhost. On most platforms, this will take a shortcut around a couple of layers of network code and be quite a bit faster.
There are also benchmarks that suggest that pipe are faster than sockets, although I agree it's not very significant.
I also suspect I might need to implement "digest authentication" for security reasons, as done by Listener and Client, which will add overhead.
I see your concerns and why you want the explicit connect-method. I don't see a better way either.
I do believe that a lot of people use loguru because it is so good out-of-the-box and configuration is optional. It really replaces print in a way most logging-libraries in most languages don't.
But then again, everyone who does parallel programming is aware that print will not really work either, so I guess this is fine.