flambe No in-built functionality for tracking of metrics during training

This is a feature request bordering on a bug. Right now, flambe does not allow to track metrics during training. This, however, is essential to monitor learning.

One problem that I see is that it does not make sense to compute the train metrics after an entire train epoch, as flambe does for test/eval metrics. Given the size of some datasets, this is not really feasible.

Consequently, the interface to the metrics needs to be able to accommodate for the incremental computation of the metrics. That, in turn, requires a decision as to how this should be implemented, partly because not every metric supports incremental computation (think: AUC). Unfortunately, having incremental computation requires to keep track of previous computations - i.e., we need a state that we update incrementally

From the top of my head, these are the choices we have:

First option: make the metrics state-ful.

The metrics would then have to be "reset" at the beginning of each epoch
An incremental method, added to the metric, could be used to update the metric

Second option: add a metric-state object.

Flambe initializes a metric-state object at the beginning of each epoch.
This metric-state object is passed into each incremental call of the metric (and any other, possibly, to have a uniform interface)
Logging can happen automatically in a method of that state-object

Third option: add local tracking for each metric (I don't think this is a good option, but I wanted to mention it for completeness)

This works like the metric state object, but with individual state objects per metric.

Jan 29 '20 14:01 cle-ros

we could also consider just computing the metrics on a per batch level during training and logging that, but then things like dropout will affect the training metrics. That's true in your proposed solutions as well, unless you re thinking of doing this during the eval step?

Jan 29 '20 16:01 jeremyasapp

The problem with the per batch level are things like AUC. If we are using the batch as negatives (as is quite common), computing the AUC per batch will be much less accurate than computing it per epoch (and using all samples from an epoch as negatives).

Besides, either approach would allow us to unify this (taken from _eval_step in train.py):

log(f'{tb_prefix}Validation/Loss', val_loss, self._step)
log(f'{tb_prefix}Validation/{self.metric_fn}', val_metric, self._step)
log(f'{tb_prefix}Best/{self.metric_fn}', self._best_metric, self._step)  # type: ignore
for metric_name, metric in self.extra_validation_metrics.items():
    log(f'{tb_prefix}Validation/{metric_name}',
        metric(preds, targets).item(), self._step)  # type: ignore

With either

for metric in self.metrics:
    metric.finalize()
    metric.log(log_func) . # log_func could be any log function, defaulting to the one above

Or

for metric in self.metrics:
    metric.finalize(metrics_state)
    metric.log(log_func, metrics_state)

That has the additional advantage that we would support logging of metrics that are more complex natively. Imagine, e.g., a combined recall-precision-fscore metric, that could jointly log all three. Or one that computes a conditional metric if, say, you have different types of samples. Than it could log things like "accuracy for type1: ..." and "accuracy for type2: ..."

Jan 29 '20 16:01 cle-ros

What do you propose to fo with the training and validationloss over the whole dataset? Since people generally use torch loss objects which won't have the "incremental" logic?

Jan 29 '20 17:01 jeremyasapp

flambe flambe copied to clipboard

No in-built functionality for tracking of metrics during training

flambe
flambe copied to clipboard