rfcs
rfcs copied to clipboard
RFC-0026-logging-system
RFC for a consistent C++/Python message logging system in PyTorch
Feature was requested in https://github.com/pytorch/pytorch/issues/72948
Seems like a good proposal in general, @kurtamohler. I think it'd be helpful to expand this to the next level of detail and show how some of these scenarios would work, and how errors and warnings can be originated in both Python and C++
Would it be at all helpful if I include a somewhat full description (or links to documentation when possible) for the current message APIs in PyTorch? I am going to write notes on it regardless for my own reference, and I can include it here if it's a good idea.
EDIT: My notes on the current messaging APIs are here: https://github.com/kurtamohler/notes/blob/main/pytorch-messaging-api/current_messaging_api.md
Would it be at all helpful if I include a somewhat full description (or links to documentation when possible) for the current message APIs in PyTorch? I am going to write notes on it regardless for my own reference, and I can include it here if it's a good idea.
EDIT: My notes on the current messaging APIs are here: https://github.com/kurtamohler/notes/blob/main/pytorch-messaging-api/current_messaging_api.md
Yep, that's a great idea
I've added a lot more detail, including a description of how message logging is currently done in PyTorch, and fixed some of the things brought up in discussion so far.
I haven't described all of the APIs for creating messages in the new system yet--I will do that next
I've added a lot more detail, including a description of how message logging is currently done in PyTorch, and fixed some of the things brought up in discussion so far.
I haven't described all of the APIs for creating messages in the new system yet--I will do that next
Hey @kurtamohler! FYI I'll be on PTO for the next several weeks, and I think @albanD is on PTO currently. So this may take some time to respond to.
we have torch.monitor (https://github.com/pytorch/rfcs/pull/30) that hasn't seen too much usage, perhaps we can use some of it for the new logging system.
How is one supposed to selectively change the log verbosity of a specific subsystem? And how would a subsystem author produces logs that are properly scoped?
While having functions like torch.log_info
is nice, it would be great if we could construct logger instances for particular modules. For example:
# distributed.py
logger = torch.Logger("torch.distributed")
def broadcast():
logger.log_info("broadcasting stuff")
Finally, is there a reason on why we can't integrate with python's logging package so there's very little users need to do to leverage this?
@kumpera, good questions.
How is one supposed to selectively change the log verbosity of a specific subsystem?
At the moment, the general idea is that users will use a filter to silence messages. The specific API for that hasn't been written down yet. Although, in the case of warnings, the warnings
module already does this. The missing detail is how the user would silence info messages, and I imagine that we would want to have something that has a very similar interface to the warnings
module filter.
And how would a subsystem author produces logs that are properly scoped?
Authors would use the appropriate message class. If an applicable message class doesn't exist yet, they should add it.
While having functions like
torch.log_info
is nice, it would be great if we could construct logger instances for particular modules.
That might be a good idea. What do you think @albanD, @mruberry ?
Finally, is there a reason on why we can't integrate with python's logging package so there's very little users need to do to leverage this?
I don't know much about it, I'll read about it and get back to you
And how would a subsystem author produces logs that are properly scoped?
Authors would use the appropriate message class. If an applicable message class doesn't exist yet, they should add it.
Sorry for not being specific here, what I meant by scoping is which subsystem is producing a given log message.
For example, if one is troubleshooting an issue on backwards of a distributed model, they would enable logging for the "autograd" and "distributed" subsystems.
I guess this boils down to whether log messages would carry a subsystem tag that can be used as part of filtering.