multiprocessing-logging
multiprocessing-logging copied to clipboard
Logging using importlib doesn't work
I am currently using multiprocessing.Pool
to read and prepare data for a machine learning task. I tried to install the logging hook right after my program starts. I first create a logger using the logging.basicConfig
method and call install_mp_logger
afterwards. After that I create multiple Pool
s and pass down the initializer
argument.
I do, however, load files dynamically using importlib.load_module
. Each of the loaded modules initializes a logger right at the start of the file:
# import statements
log = logging.getLogger(__name__)
# method that gets called after the module has been loaded
def called_from_imap(file):
pass
def called_from_main():
pass
I suspect that the fact that I import the model using importlib
prevents this library from working properly. The strange part is though, that each of the imported modules has two functions, the first gets called within the context of a ´Pool.imap´ call, the second one gets called within the same thread as the main function, without creating a Pool
. The second one does, however, log everything correctly.
Any idea what the problem might be?
Python: 3.6.8 OS: Windows
The initializer
argument that you pass is unrelated to this library, right?
I don't see how using importlib
should affect logging. Maybe it's something Windows specific. If that's the case, I cannot help.
Do you happen to have a minimal case to reproduce this problem that I could try to run locally?
I found this issue today and tried it out. It didn't work for me. I pass install_mpl_handler
as he/she did.
I created a MWE here. The weird thing is though, it even works without calling the install hook. I forgot to initialize your library but still got the output of all workers in the file.
And my production code is basically the same concept as the MWE. I am currently writing and uploading the code on Linux though. I am later on home and can try it on Windows too.
Weird that it even works without using your module...
@jruere I just checked it on a Windows PC and it doesn't work. So it is a Windows specific "bug". Do you have an idea what the cause would be? Is it the fact that processes might be forked differently on Windows and Linux?
Depending on how the fork is done (or whatever is used on Windows), child processes will write to the file. The difference is that they all write concurrently so records could be corrupted.
I don't understand why passing initializer=multiprocessing_logging.install_mp_handler
would be necessary. Since initializing should happen only once, I think it's incorrect.
I don't know if the original report is installing the handler on the initializer as well.
I don't think that it is necessary at all. I just copy pasted what the other person did to make it work. But it didn't work for me.
Can I ask you how the library works exactly? I checked the code but I think I don't know the specifics of the logging framework to understand it completely.
How does the synching of the forked processes actually work? I saw that you use a Queue
, does a Queue
have access across multiple python instances?
This is not a good medium for explaining. I'll write an explanation and link it here but it won't be done soon.
I believe I have a similar problem. it looks like using importlib.import_module adds a handler (or several, i'm not sure yet) to the root logger of logging.
This make
logging.basicConfig(
level=logging.INFO,
)
do nothing, and I cannot access logs of level under WARNING.
as per it's docstring :
This function does nothing if the root logger already has handlers configured, unless the keyword argument *force* is set to True
We can check that the handler appears on import_module by looking at logging.getLogger()
before and after the call
Solution I found : either set force=true in logging.basicConfig or
for handler in logging.getLogger().handlers:
logging.getLogger().removeHandler(handler)
after the call to import_module