composer icon indicating copy to clipboard operation
composer copied to clipboard

[Q] How is the output of validate mapped to metrics?

Open vedantroy opened this issue 3 years ago • 6 comments

I'm trying to do something simple: log validation loss every N batches. Reading through the docs I find this:

    def validate(self, batch):
        labels = batch.pop('labels')
        output = self.forward(batch)
        output = output['logits']
        return output, labels

    def metrics(self, train: bool = False):
        if train:
            return MetricCollection([self.train_loss, self.train_acc])
        return MetricCollection([self.val_loss, self.val_acc])

I'm confused about how the output of the validate method maps to the metrics method. Is self.train_acc / self.val_acc only updated with the labels value from the validate method?

Also, I'm assuming validation metrics are computed once for the validation set. But what about training metrics?

vedantroy avatar Aug 05 '22 07:08 vedantroy

Hi @vedantroy , the validation loop does something like this:

metrics = model.metrics(train=False)

for batch in val_dataloader:
    outputs, targets = model.validate(batch)
    metrics.update(outputs, targets)  # implements the torchmetrics interface

metrics.compute()

More details can be found here: https://docs.mosaicml.com/en/v0.8.2/composer_model.html#metrics

And yes, validation metrics are computed once for each validation set. To enable computing the training metrics, bed sure to include compute_training_metrics=True to the Trainer. This will have a performance penalty.

hanlint avatar Aug 06 '22 17:08 hanlint

@hanlint How does this work when I setup torchmetrics to use a dictionary. See: https://github.com/Lightning-AI/metrics/issues/682

Should I return a dictionary from my validate method?

vedantroy avatar Aug 08 '22 22:08 vedantroy

Hi @vedantroy , the validate method should return a tuple of outputs, targets that is then used to call metrics.

Since your torchmetrics is using a MetricCollection, that should work. Even though you supplied a dictionary, that is wrapped in a MetricCollection, which we support.

hanlint avatar Aug 09 '22 13:08 hanlint

Edit: Figured out how torchmetrics works. Below stuff is irrelevant.

@hanlint I'm confused as to how this interface works. For example: I want to log 2 different losses from my validation batch. There's no output or target, there's just 2 different scalar values?

For example: I don't really have "targets". I have this thing called a GaussianDiffusionProcess that handles calculating different loss values, there is no straightforward segmentation mask or something like that.

vedantroy avatar Aug 10 '22 02:08 vedantroy

@hanlint This is the thing I want to log for my validation batch https://github.com/vedantroy/improved-ddpm-pytorch/blob/d2d6954f19b7b850bb45aff815f1329df3f2a5f4/diffusion/diffusion.py#L293

(Irrelevant, figured it out; I need to implement some custom metrics)

vedantroy avatar Aug 10 '22 03:08 vedantroy

Fwiw, the specific thing I was confused about in the documentation was (for some reason), I assumed that the validate method was probably going to be something along the lines of "return a tuple of N values where each of the N values is a particular loss/signal you are calculating", and it was unclear to me if this is what was happening in the docs.

vedantroy avatar Aug 10 '22 03:08 vedantroy

Thanks! We've updated the documentation to reflect this in https://github.com/mosaicml/composer/pull/1396

hanlint avatar Aug 30 '22 03:08 hanlint