Include worker stdout in log
- [x] Closes #2033
- [ ] Tests added / passed
- [x] Passes
black distributed/flake8 distributed
I was asking about how to make print output to worker logs in https://github.com/dask/dask/discussions/7202 and @jrbourbeau pointed me to #2033. I took @jakirkham's idea in this comment https://github.com/dask/distributed/issues/2033#issuecomment-395483636 and made it just do strout.
I imagine this behavior should be configurable, but it's not clear to me where that config should live. I originally thought it could be another level of logging called "STDOUT" or something that you'd use instead of "INFO". But that is kind of strange. I do think this is something that we'd want to configure globally.
FWIW, I'm dealing more with GDAL these days, which prints a lot of stuff to stdout / stderr rather than raising exceptions. So I'd be happy to see this if there aren't any downsides.
(But perhaps it should be a config option, in case there are downsides?)
(But perhaps it should be a config option, in case there are downsides?)
I agree that it should be a config option. Especailly since it's a big change in behavior, so it'd be nice to have it not be on by default to start.
With C/C++ libraries we might try wurlitzer, which does some file descriptor magic to enable redirection
https://github.com/minrk/wurlitzer
👍 to capturing stdout / stderr in logs!
What if I want to see the output of a particular python logger in my Dask worker logs? Like, say there is a package that does
import logging
logger = logging.getLogger("foo")
def do_stuff():
logger.info("Hello world")
In a standard script, I can do something like
logger = logging.getLogger("foo")
handler = logging.StreamHandler()
handler.setLevel(logging.INFO)
logger.addHandler(handler)
in order to see the logs.
Now imagine that the do_stuff function is called within a task on Dask worker... How can I make the log output show up in the worker logs? Would this PR support that?
xref https://github.com/pangeo-forge/pangeo-forge/issues/84
Sorry if this is off topic. Happy to move to a separate issue if needed.
It's been a while since I wanted to create an issue or comment on a open one on this subject, 👍 from me also!
Any progress on this? I would be willing to help if needed
@AndreaGiardini I wasn't sure what the best approach would be to make this configurable. Do you have ideas about how this functionality should work?
I'm not a dask expert by any means but maybe we could consider something like:
- Adding a
include_stdoutflag in the[logging]section of the dask configuration include_stdoutis a boolean, defaults toFalse(to keep current behaviour)- if
include_stdout = Truewe can redirect stdout / stderr to a python logger (maybe we can have an separateinclude_stderr?)
What do you think?
Putting this in without allowing configuration is probably actually dangerous. If there are applications runnign which print a lot, this would suddenly cause a lot of logs which might otherwise not be handled properly (file log rotation not setup or log aggregation servers are overloaded, etc.). The question of how this configuration should then look like is also interesting since we might have different log handlers, log levels, etc. IIUC this is about user code prints being forwarded to the loggers, isn't it? What about offering a util function which defines a ctx manager to do so?
Yeah this PR is really meant as a starting point and is not mergeable until there is a config setting. I think a context manager sounds like a much better idea @fjetter. Another potential idea is to have a worker plugin or something similar that the user has to explicitly add.