composer [Q] How is the output of validate mapped to metrics?

I'm trying to do something simple: log validation loss every N batches. Reading through the docs I find this:

    def validate(self, batch):
        labels = batch.pop('labels')
        output = self.forward(batch)
        output = output['logits']
        return output, labels

    def metrics(self, train: bool = False):
        if train:
            return MetricCollection([self.train_loss, self.train_acc])
        return MetricCollection([self.val_loss, self.val_acc])

I'm confused about how the output of the validate method maps to the metrics method. Is self.train_acc / self.val_acc only updated with the labels value from the validate method?

Also, I'm assuming validation metrics are computed once for the validation set. But what about training metrics?

Aug 05 '22 07:08 vedantroy

Hi @vedantroy , the validation loop does something like this:

metrics = model.metrics(train=False)

for batch in val_dataloader:
    outputs, targets = model.validate(batch)
    metrics.update(outputs, targets)  # implements the torchmetrics interface

metrics.compute()

More details can be found here: https://docs.mosaicml.com/en/v0.8.2/composer_model.html#metrics

And yes, validation metrics are computed once for each validation set. To enable computing the training metrics, bed sure to include compute_training_metrics=True to the Trainer. This will have a performance penalty.

Aug 06 '22 17:08 hanlint

@hanlint How does this work when I setup torchmetrics to use a dictionary. See: https://github.com/Lightning-AI/metrics/issues/682

Should I return a dictionary from my validate method?

Aug 08 '22 22:08 vedantroy

Hi @vedantroy , the validate method should return a tuple of outputs, targets that is then used to call metrics.

Since your torchmetrics is using a MetricCollection, that should work. Even though you supplied a dictionary, that is wrapped in a MetricCollection, which we support.

Aug 09 '22 13:08 hanlint

Edit: Figured out how torchmetrics works. Below stuff is irrelevant.

@hanlint I'm confused as to how this interface works. For example: I want to log 2 different losses from my validation batch. There's no output or target, there's just 2 different scalar values?

For example: I don't really have "targets". I have this thing called a GaussianDiffusionProcess that handles calculating different loss values, there is no straightforward segmentation mask or something like that.

Aug 10 '22 02:08 vedantroy

@hanlint This is the thing I want to log for my validation batch https://github.com/vedantroy/improved-ddpm-pytorch/blob/d2d6954f19b7b850bb45aff815f1329df3f2a5f4/diffusion/diffusion.py#L293

(Irrelevant, figured it out; I need to implement some custom metrics)

Aug 10 '22 03:08 vedantroy

Fwiw, the specific thing I was confused about in the documentation was (for some reason), I assumed that the validate method was probably going to be something along the lines of "return a tuple of N values where each of the N values is a particular loss/signal you are calculating", and it was unclear to me if this is what was happening in the docs.

Aug 10 '22 03:08 vedantroy

Thanks! We've updated the documentation to reflect this in https://github.com/mosaicml/composer/pull/1396

Aug 30 '22 03:08 hanlint

composer composer copied to clipboard

[Q] How is the output of validate mapped to metrics?

composer
composer copied to clipboard