Sticky messages with subprocesses do not work well
When writing https://github.com/JuliaCI/PkgBenchmark.jl/pull/104, I realized that sticky messages do not work well with sub-processes. An example:
using Logging
using ProgressLogging
using TerminalLoggers
with_logger(TerminalLogger()) do
@progress for _ in 1:10
code = """
using Logging
using ProgressLogging
using TerminalLoggers
with_logger(TerminalLogger()) do
@info "From child process" getpid()
@progress for _ in 1:10
sleep(0.01)
end
end
"""
run(`julia --startup-file=no -e $code`)
end
end
But I'm not sure if it's probably impossible to fix this in TerminalLoggers.jl. Maybe implementing some workarounds using an environment variable could be helpful.
Ultimate solution may be to merge log records from subprocesses to the main process using IPC-based logger "bridge" (which is way outside of the scope of TerminalLoggers.jl, I guess, though).
Ultimate solution may be to merge log records from subprocesses to the main process using IPC-based logger "bridge" (which is way outside of the scope of TerminalLoggers.jl, I guess, though).
Yes, this is ultimately what's required: the code passed to the subprocess would need to install a logger which forwards messages to the parent process, and the parent process would need to set up an async task to listen for those. It "might be nice" if we could make this work automatically when the child process is Base.julia_cmd(), but for generic subprocesses which "just happen" to call julia it seems difficult.
One thing which certainly seems realistic is to make this work automatically when using Distributed to spawn the child processes.
Maybe implementing some workarounds using an environment variable could be helpful.
True, an environment variable could be used to define some "julia log server" context and that could be used to initialize the default global logger to something other than ConsoleLogger.
Hopefully getting the IPC right is not too much of a nightmare. For functionality on localhost (without authentication or encryption) I guess we could just use serialize and a socket. This seems very similar to the work already done in Distributed.LocalManager though, where the IPC side of things might already be sorted out.
So maybe getting it working with Distributed is the best thing to do first.
It "might be nice" if we could make this work automatically when the child process is
Base.julia_cmd(), but for generic subprocesses which "just happen" to call julia it seems difficult.
Yeah, I think that'd be super cool. I cannot decide if it is too much of a magic to happen automatically, though.
I agree adding a support for processes created via Distributed is the first step.
Maybe implementing some workarounds using an environment variable could be helpful.
True, an environment variable could be used to define some "julia log server" context
The workaround I was thinking was something stupid-simple: just use the "Sys.iswindows() code path" if the current process shares the same TTY with the parent process. But maybe it's worth doing this right from the getgo as the pitfall like the OP would probably be very rare anyway. Compared to that, logging inside Distributed is much more common. In fact, I already created something like this just for progress information in Transducers.jl to support progress logging in distributed reduction. I can throw it away if logging across workers just works.
I cannot decide if it is too much of a magic to happen automatically, though.
Haha, I suspect it probably is :-) If log record serialization is done via serialize we preserve full Julia objects which is great. But we get some other tricky problems like making sure all processes have the same modules loaded. So making it just work outside Distributed might be a bit of a can of worms.
Ah, yes, serialization is super tricky when two processes can activate different projects.
Hmm... I wonder if it's better to lower everything into (say) JSON-encodable primitive types (or msgpack/BSON/etc.) at each process. Probably there are already several language-agnostic logging frameworks that do this.
I don't think there's a one-size-fits-all approach.
I suspect that for Distributed it would be fairly natural to use serialize because Distributed seems to target cases where all processes are part of the same application. Also for HPC workloads where the logs may be serialized to disk per node for performance reasons, but it's still conceptually a "single application".
For more heterogeneous distributed systems I think the JSON-or-similar approach will be needed. There's a whole lot of log servers for the receive side which already do this kind of thing, so we could choose one of the better ones and take some conventions from that.