multiprocessing-logging Logging using importlib doesn't work

Logging using importlib doesn't work

Open mainrs opened this issue 4 years ago • 7 comments

I am currently using multiprocessing.Pool to read and prepare data for a machine learning task. I tried to install the logging hook right after my program starts. I first create a logger using the logging.basicConfig method and call install_mp_logger afterwards. After that I create multiple Pools and pass down the initializer argument.

I do, however, load files dynamically using importlib.load_module. Each of the loaded modules initializes a logger right at the start of the file:

# import statements
log = logging.getLogger(__name__)

# method that gets called after the module has been loaded
def called_from_imap(file):
   pass

def called_from_main():
    pass

I suspect that the fact that I import the model using importlib prevents this library from working properly. The strange part is though, that each of the imported modules has two functions, the first gets called within the context of a ´Pool.imap´ call, the second one gets called within the same thread as the main function, without creating a Pool. The second one does, however, log everything correctly.

Any idea what the problem might be?

Python: 3.6.8 OS: Windows

Jul 26 '19 10:07 mainrs

The initializer argument that you pass is unrelated to this library, right?

I don't see how using importlib should affect logging. Maybe it's something Windows specific. If that's the case, I cannot help.

Do you happen to have a minimal case to reproduce this problem that I could try to run locally?

Jul 26 '19 15:07 jruere

I found this issue today and tried it out. It didn't work for me. I pass install_mpl_handler as he/she did.

I created a MWE here. The weird thing is though, it even works without calling the install hook. I forgot to initialize your library but still got the output of all workers in the file.

And my production code is basically the same concept as the MWE. I am currently writing and uploading the code on Linux though. I am later on home and can try it on Windows too.

Weird that it even works without using your module...

Jul 26 '19 15:07 mainrs

@jruere I just checked it on a Windows PC and it doesn't work. So it is a Windows specific "bug". Do you have an idea what the cause would be? Is it the fact that processes might be forked differently on Windows and Linux?

Jul 29 '19 07:07 mainrs

Depending on how the fork is done (or whatever is used on Windows), child processes will write to the file. The difference is that they all write concurrently so records could be corrupted.

I don't understand why passing initializer=multiprocessing_logging.install_mp_handler would be necessary. Since initializing should happen only once, I think it's incorrect.

I don't know if the original report is installing the handler on the initializer as well.

Jul 29 '19 08:07 jruere

I don't think that it is necessary at all. I just copy pasted what the other person did to make it work. But it didn't work for me.

Can I ask you how the library works exactly? I checked the code but I think I don't know the specifics of the logging framework to understand it completely.

How does the synching of the forked processes actually work? I saw that you use a Queue, does a Queue have access across multiple python instances?

Jul 29 '19 10:07 mainrs

This is not a good medium for explaining. I'll write an explanation and link it here but it won't be done soon.

Jul 29 '19 20:07 jruere

I believe I have a similar problem. it looks like using importlib.import_module adds a handler (or several, i'm not sure yet) to the root logger of logging.

This make

logging.basicConfig(
           level=logging.INFO,
       )

do nothing, and I cannot access logs of level under WARNING. as per it's docstring : This function does nothing if the root logger already has handlers configured, unless the keyword argument *force* is set to True

We can check that the handler appears on import_module by looking at logging.getLogger() before and after the call

Solution I found : either set force=true in logging.basicConfig or

for handler in logging.getLogger().handlers:
    logging.getLogger().removeHandler(handler)

after the call to import_module

Jun 10 '22 15:06 emilienDespres

multiprocessing-logging multiprocessing-logging copied to clipboard

Logging using importlib doesn't work

multiprocessing-logging
multiprocessing-logging copied to clipboard