composer
composer copied to clipboard
[Q] How is the output of validate mapped to metrics?
I'm trying to do something simple: log validation loss every N batches. Reading through the docs I find this:
def validate(self, batch):
labels = batch.pop('labels')
output = self.forward(batch)
output = output['logits']
return output, labels
def metrics(self, train: bool = False):
if train:
return MetricCollection([self.train_loss, self.train_acc])
return MetricCollection([self.val_loss, self.val_acc])
I'm confused about how the output of the validate method maps to the metrics method.
Is self.train_acc / self.val_acc only updated with the labels value from the validate method?
Also, I'm assuming validation metrics are computed once for the validation set. But what about training metrics?
Hi @vedantroy , the validation loop does something like this:
metrics = model.metrics(train=False)
for batch in val_dataloader:
outputs, targets = model.validate(batch)
metrics.update(outputs, targets) # implements the torchmetrics interface
metrics.compute()
More details can be found here: https://docs.mosaicml.com/en/v0.8.2/composer_model.html#metrics
And yes, validation metrics are computed once for each validation set. To enable computing the training metrics, bed sure to include compute_training_metrics=True to the Trainer. This will have a performance penalty.
@hanlint How does this work when I setup torchmetrics to use a dictionary. See: https://github.com/Lightning-AI/metrics/issues/682
Should I return a dictionary from my validate method?
Hi @vedantroy , the validate method should return a tuple of outputs, targets that is then used to call metrics.
Since your torchmetrics is using a MetricCollection, that should work. Even though you supplied a dictionary, that is wrapped in a MetricCollection, which we support.
Edit: Figured out how torchmetrics works. Below stuff is irrelevant.
@hanlint I'm confused as to how this interface works. For example: I want to log 2 different losses from my validation batch. There's no output or target, there's just 2 different scalar values?
For example: I don't really have "targets". I have this thing called a GaussianDiffusionProcess that handles calculating different loss values, there is no straightforward segmentation mask or something like that.
@hanlint This is the thing I want to log for my validation batch https://github.com/vedantroy/improved-ddpm-pytorch/blob/d2d6954f19b7b850bb45aff815f1329df3f2a5f4/diffusion/diffusion.py#L293
(Irrelevant, figured it out; I need to implement some custom metrics)
Fwiw, the specific thing I was confused about in the documentation was (for some reason), I assumed that the validate method was probably going to be something along the lines of "return a tuple of N values where each of the N values is a particular loss/signal you are calculating", and it was unclear to me if this is what was happening in the docs.
Thanks! We've updated the documentation to reflect this in https://github.com/mosaicml/composer/pull/1396