distributed Include worker stdout in log

[x] Closes #2033
[ ] Tests added / passed
[x] Passes black distributed / flake8 distributed

I was asking about how to make print output to worker logs in https://github.com/dask/dask/discussions/7202 and @jrbourbeau pointed me to #2033. I took @jakirkham's idea in this comment https://github.com/dask/distributed/issues/2033#issuecomment-395483636 and made it just do strout.

I imagine this behavior should be configurable, but it's not clear to me where that config should live. I originally thought it could be another level of logging called "STDOUT" or something that you'd use instead of "INFO". But that is kind of strange. I do think this is something that we'd want to configure globally.

Feb 10 '21 15:02 jsignell

FWIW, I'm dealing more with GDAL these days, which prints a lot of stuff to stdout / stderr rather than raising exceptions. So I'd be happy to see this if there aren't any downsides.

(But perhaps it should be a config option, in case there are downsides?)

Feb 10 '21 15:02 TomAugspurger

(But perhaps it should be a config option, in case there are downsides?)

I agree that it should be a config option. Especailly since it's a big change in behavior, so it'd be nice to have it not be on by default to start.

Feb 10 '21 15:02 jsignell

With C/C++ libraries we might try wurlitzer, which does some file descriptor magic to enable redirection

https://github.com/minrk/wurlitzer

Feb 10 '21 16:02 jakirkham

👍 to capturing stdout / stderr in logs!

What if I want to see the output of a particular python logger in my Dask worker logs? Like, say there is a package that does

import logging
logger = logging.getLogger("foo")

def do_stuff():
    logger.info("Hello world")

In a standard script, I can do something like

logger = logging.getLogger("foo") 
handler = logging.StreamHandler() 
handler.setLevel(logging.INFO)
logger.addHandler(handler)

in order to see the logs.

Now imagine that the do_stuff function is called within a task on Dask worker... How can I make the log output show up in the worker logs? Would this PR support that?

xref https://github.com/pangeo-forge/pangeo-forge/issues/84

Sorry if this is off topic. Happy to move to a separate issue if needed.

Apr 01 '21 15:04 rabernat

It's been a while since I wanted to create an issue or comment on a open one on this subject, 👍 from me also!

Apr 03 '21 20:04 guillaumeeb

Any progress on this? I would be willing to help if needed

May 11 '21 14:05 AndreaGiardini

@AndreaGiardini I wasn't sure what the best approach would be to make this configurable. Do you have ideas about how this functionality should work?

May 24 '21 15:05 jsignell

I'm not a dask expert by any means but maybe we could consider something like:

Adding a include_stdout flag in the [logging] section of the dask configuration
include_stdout is a boolean, defaults to False (to keep current behaviour)
if include_stdout = True we can redirect stdout / stderr to a python logger (maybe we can have an separate include_stderr ?)

What do you think?

May 25 '21 07:05 AndreaGiardini

Putting this in without allowing configuration is probably actually dangerous. If there are applications runnign which print a lot, this would suddenly cause a lot of logs which might otherwise not be handled properly (file log rotation not setup or log aggregation servers are overloaded, etc.). The question of how this configuration should then look like is also interesting since we might have different log handlers, log levels, etc. IIUC this is about user code prints being forwarded to the loggers, isn't it? What about offering a util function which defines a ctx manager to do so?

May 25 '21 08:05 fjetter

Yeah this PR is really meant as a starting point and is not mergeable until there is a config setting. I think a context manager sounds like a much better idea @fjetter. Another potential idea is to have a worker plugin or something similar that the user has to explicitly add.

Jun 03 '21 20:06 jsignell