composer
composer copied to clipboard
Help users reduce `ProgressBarLogger` print frequency
Making this a draft PR in case I'm biting off more than I can chew...
This PR does the folllowing:
- Simplify typing of
console_log_level: Union[str, int, LogLevel]
because it seems pretty unlikely someone will build a whole Callable requiring imports of LogLevel and State- this also better matches the name of the variable
- and matches the signature of other LoggerDestinations like
InMemoryLogger.log_level
- Set default to
LogLevel.BATCH
which I'm pretty sure makes more sense - Add
console_log_every_n_batches: int = 1
to allow users to easily scale down the frequency - update some YAMLs to remove references to
progress_bar
which is now added by default in theTrainer
Questions from Abhi:
- I see that the ProgressBarLogger is sort of added automatically to the Trainer, but then I also see all the
ProgressBarLogger
args passed in as well :( I wonder if that's a good practice or whether it would make more sense to just havetrainer.progress_bar_logger: bool = True
which instantiates a good defaultProgressBarLogger
, but then if you want to do anything custom, you ought to just construct the object yourself... It makes me sad to see 4-5ProgressBarLogger
specific args littering the trainer signature.
relevant code snippet from trainer.py
loggers = list(ensure_tuple(loggers))
if any(isinstance(x, ProgressBarLogger) for x in loggers):
warnings.warn(
DeprecationWarning(
(f'Specifying the {ProgressBarLogger.__name__} via `loggers` is deprecated. Instead, '
'please specify `progress_bar`, `log_to_console`, `log_level`, and `stream` arguments when '
'constructing the trainer. If specified, these arguments will be ignored, as the '
f'{ProgressBarLogger.__name__} was already created.')))
- Is
log_every_n_batches
useful enough that it should be made global, or something that each LoggerDestination implements? I know we've had some debate before and I have a light preference for per-logger args, because in the cloud setting, I probably want super fine detailed logs going to WandB, but I want my console to be relatively sparse, maybe logging every 1000 batches.
The new user experience:
Default
$ composer examples/run_composer_trainer.py -f composer/yamls/models/bert-base.yaml
...
train 0%| | 229376/286720000 [00:45<10:33:07, 7541.75sp/s, loss/train=9.4105]
Default --log_to_console
$ composer examples/run_composer_trainer.py -f composer/yamls/models/bert-base.yaml --log_to_console
...
[BATCH][sample=94208/286720000]: { "trainer/global_step": 23, "trainer/batch_idx": 23, }
[BATCH][sample=94208/286720000]: { "trainer/grad_accum": 1, }
[BATCH][sample=94208/286720000]: { "loss/train": 9.9996, }
[BATCH][sample=98304/286720000]: { "trainer/global_step": 24, "trainer/batch_idx": 24, }
[BATCH][sample=98304/286720000]: { "trainer/grad_accum": 1, }
[BATCH][sample=98304/286720000]: { "loss/train": 9.9705, }
[BATCH][sample=102400/286720000]: { "trainer/global_step": 25, "trainer/batch_idx": 25, }
[BATCH][sample=102400/286720000]: { "trainer/grad_accum": 1, }
[BATCH][sample=102400/286720000]: { "loss/train": 9.9430, }
[BATCH][sample=106496/286720000]: { "trainer/global_step": 26, "trainer/batch_idx": 26, }
train 0%| | 106496/286720000 [00:30<10:41:37, 7445.01sp/s, loss/train=9.9430]
Using --log_to_console
and --console_log_every_n_batches 5
$ composer examples/run_composer_trainer.py -f composer/yamls/models/bert-base.yaml --log_to_console --console_log_every_n_batches 5
...
[BATCH][sample=163840/286720000]: { "trainer/global_step": 40, "trainer/batch_idx": 40, }
[BATCH][sample=163840/286720000]: { "trainer/grad_accum": 1, }
[BATCH][sample=163840/286720000]: { "loss/train": 9.6202, }
[BATCH][sample=184320/286720000]: { "trainer/global_step": 45, "trainer/batch_idx": 45, }
[BATCH][sample=184320/286720000]: { "trainer/grad_accum": 1, }
[BATCH][sample=184320/286720000]: { "loss/train": 9.5386, }
[BATCH][sample=204800/286720000]: { "trainer/global_step": 50, "trainer/batch_idx": 50, }
[BATCH][sample=204800/286720000]: { "trainer/grad_accum": 1, }
[BATCH][sample=204800/286720000]: { "loss/train": 9.4781, }
train 0%| | 217088/286720000 [00:45<11:25:49, 6962.53sp/s, loss/train=9.4541
I see that the ProgressBarLogger is sort of added automatically to the Trainer, but then I also see all the ProgressBarLogger args passed in as well :( I wonder if that's a good practice or whether it would make more sense to just have trainer.progress_bar_logger: bool = True which instantiates a good default ProgressBarLogger, but then if you want to do anything custom, you ought to just construct the object yourself... It makes me sad to see 4-5 ProgressBarLogger specific args littering the trainer signature.
From what I recall, we are deprecating allowing folks to pass in the ProgressBarLogger as a callback, for a technical reason that I do not quite recall.. I think this has to do with how the console is handeled.. cc: @ravi-mosaicml .
I see that the ProgressBarLogger is sort of added automatically to the Trainer, but then I also see all the ProgressBarLogger args passed in as well :( I wonder if that's a good practice or whether it would make more sense to just have trainer.progress_bar_logger: bool = True which instantiates a good default ProgressBarLogger, but then if you want to do anything custom, you ought to just construct the object yourself... It makes me sad to see 4-5 ProgressBarLogger specific args littering the trainer signature.
From what I recall, we are deprecating allowing folks to pass in the ProgressBarLogger as a callback, for a technical reason that I do not quite recall.. I think this has to do with how the console is handeled.. cc: @ravi-mosaicml .
Yes, all prints should go through tqdm so they do not conflict with the progress bar. This allows tqdm to put the prints "above" the current pbar, which limits progress bar jumping and terminal clogging.
Is log_every_n_batches useful enough that it should be made global, or something that each LoggerDestination implements? I know we've had some debate before and I have a light preference for per-logger args, because in the cloud setting, I probably want super fine detailed logs going to WandB, but I want my console to be relatively sparse, maybe logging every 1000 batches.
From our discussion with Evan, I think we decided that we would offload the frequency to the caller, rather than add it to each logger?
From our discussion with Evan, I think we decided that we would offload the frequency to the caller, rather than add it to each logger?
I think it makes sense to allow the caller to do things on some custom frequency if it wants to, but in addition it seems pretty useful to have a logger- or global- scoped frequency limiter ... for instance, right now I have like 4 different things producing logs: the Trainer itself, my LRMonitor, my SpeedMonitor, the GradientClippingAlgorithm. I certainly don't want to pass log_every_n_batches
to every one of these objects, right?
Questions from Abhi:
- I see that the ProgressBarLogger is sort of added automatically to the Trainer, but then I also see all the
ProgressBarLogger
args passed in as well :( I wonder if that's a good practice or whether it would make more sense to just havetrainer.progress_bar_logger: bool = True
which instantiates a good defaultProgressBarLogger
, but then if you want to do anything custom, you ought to just construct the object yourself... It makes me sad to see 4-5ProgressBarLogger
specific args littering the trainer signature.
I very much agree with this! I didn't do it in my PR, but want to do it eventually. I'd certainly be down if you did it in yours and I could deal with the merge.
relevant code snippet from
trainer.py
loggers = list(ensure_tuple(loggers)) if any(isinstance(x, ProgressBarLogger) for x in loggers): warnings.warn( DeprecationWarning( (f'Specifying the {ProgressBarLogger.__name__} via `loggers` is deprecated. Instead, ' 'please specify `progress_bar`, `log_to_console`, `log_level`, and `stream` arguments when ' 'constructing the trainer. If specified, these arguments will be ignored, as the ' f'{ProgressBarLogger.__name__} was already created.')))
Let's remove this.
I am fan of the pattern of one arg in the trainer to enable and good defaults that are well-documented. Then, if you want customizable behavior, you can pass in your own. I really think we should move towards the pattern of IF we create an object under the hood that we want to abstract away from the user then we ONLY expose one argument for configuring it in the trainer.
- Is
log_every_n_batches
useful enough that it should be made global, or something that each LoggerDestination implements? I know we've had some debate before and I have a light preference for per-logger args, because in the cloud setting, I probably want super fine detailed logs going to WandB, but I want my console to be relatively sparse, maybe logging every 1000 batches.
I like that actually. I think removing log_levels, defaulting per-batch, and then exposing log_every_n_batches is fine.
Two follow-up questions:
- What do we do about per-microbatch if someone wants that (another arg I guess?)
- I think it's becoming clear that
ProgressBarLogger
andFileLogger
should really be separate from the other loggers potentially? it's hard to do a one size fits all solution whenProgressBarLogger
is just very different conceptually fromWandBLogger
and the others.
From our discussion with Evan, I think we decided that we would offload the frequency to the caller, rather than add it to each logger?
I think it makes sense to allow the caller to do things on some custom frequency if it wants to, but in addition it seems pretty useful to have a logger- or global- scoped frequency limiter ... for instance, right now I have like 4 different things producing logs: the Trainer itself, my LRMonitor, my SpeedMonitor, the GradientClippingAlgorithm. I certainly don't want to pass
log_every_n_batches
to every one of these objects, right?
As mentioned above I will remove log_levels, but yeah we might have to add a log_every_n_batches
to every LoggerDestination
(not every Callback
that calls a Logger.log
function!), but IF ProgressBarLogger
was separate from the others, we could set different defaults for ProgressBarLogger
vs. WandBLogger
...
Evaluators have an optional eval_interval
argument. If it's not passed the evaluator default to using the trainer's eval_interval
. How do people feel about implementing log_every_n_batches
similarly to how we implement eval_interval
, such that you can specify log_every_n_batches
for each logger and if it's not specified it falls back to the trainer value (which has a sensible default). And perhaps renaming log_every_n_batches
to log_interval
for brevity?