quickvision icon indicating copy to clipboard operation
quickvision copied to clipboard

[RFC] Metrics Paramater in training loops

Open oke-aditya opened this issue 4 years ago • 6 comments

🚀 Feature

Slightly important as engines are concerned here :sweat_smile: Metrics parameter in training.

Motivation

What currently we do in train_step, val_step or fit API returns a dictionary by calculating metrics from torchvision and return metrics (but not predictions). This hardcodes the API and is trouble. Metrics param which will open up this API. Currently train_step will always return metrics, and not return results/model outputs. Let's add a param metrics which will work as follows.

Pitch

For Detection models in PyTorch Trainers, the Param would work as follows.

metrics: str = None
if metrics is None:
    # Just output what model does and return it.
    # Now people can manually calculate and play around as they like 
    # We do not do PostProcessing, this can be used by the user which we provide. 
elif metrics == "native":
    # Use the torchvision native metrics which we have now.
    # We can expect these to improve over time or adopt lightning metrics here.

## Maybe if we don't get this dependency
elif metrics = "coco":
    # Calculate COCO API metrics (we do the PostProcessing if needed)

Since PyTorch Lightning API is completely decoupled from PyTorch. People can already subclass it and use PL Metrics. E.g.

class CustomTraining(lit_frcnn):
        def validation_step(self, batch, batch_idx):
            images, targets = batch
            targets = [{k: v for k, v in t.items()} for t in targets]

            out = self.model(images) 
            # Do whatever metrics you like !

Metrics with lIghtning is easer, we can use PL metrics and calculate these on fly with a param in Trainer Not sure about this but possible easily.

Alternatives

This is open to discussion and really a neat solution can be a major feature upgrade.

Additional context

This will solve lot of issues like #62 , Be able to use COCO API for detection. We should not have this dependency. So COCO is still a little confusion, with metrics None, people can postprocess (which we provide) and use COCO API.

cc @zlapp @zhiqwang @hassiahk

oke-aditya avatar Dec 15 '20 12:12 oke-aditya

Metrics is a little abstract, maybe we could also integrate wandb or tensorboard to quickvision to show the performance of training strategy

zhiqwang avatar Dec 15 '20 12:12 zhiqwang

People can use wandb / tensorboard. E.g Wandb notebook out of box with current metrics.

We probably should avoid wandb dependency. Again PyTorch Lightning provides these loggers out of box and current API flexible with PyTorch too.

By avoiding integrating wandb or such loggers, people can manually log whatever, whenever, wherever they like.

oke-aditya avatar Dec 15 '20 12:12 oke-aditya

After a few discussions over slack with @zlapp and PyTorch Ligthninig Team Lightning will have Chainable metrics with 1.2. Here is Issue for it We cannot pass trainer flags for lightning and might not be possible soon.

Should we start this refactor. it would mean updating tests a bit as well as docs. Since now the output is controlled by metrics param. We should probably have consistency within API Let's think over it.

oke-aditya avatar Dec 16 '20 08:12 oke-aditya

So here is API proposal for metrics. Let's not tie Pycoco Tools with Quickvision.

This is for extra param with train_step and val_step

metrics: bool = False
if metrics is False:
    # Just output what model does and return it.
    # Now people can manually calculate and play around as they like 
    # We do not do PostProcessing, this can be used by the user which we provide. 
 
   # Preds cotaining the prediction that are done for train data.
   # For detection models from torchvision we can simply return loss in train and boxes in eval.
    # preds will contain output of model. E.g out tensor in classification, or detr pred logits and boxes.
    # Possibly it will be a nested dictionary with consistent names.

   return {"preds " : }

else:
    # Use the torchvision native metrics which we have now.
    # We can expect these to improve over time or adopt lightning metrics here.
    # metrics key will contain whatever metrics we return right now. 
   # here too preds is same as above for consistency.
    return {"preds: " [] , "metrics": {}}

oke-aditya avatar Dec 21 '20 10:12 oke-aditya

@zlapp to use Pycoco tools or any other metrics is will be very simple.

You can just use

train_results = model.train_step(metrics=False)

boxes = train_result[preds][boxes]

train_results will be a dict. It's values will be a torch.Tensor no abstractions here. Now you can post process or do whatever you like and use PyCOCO tools, etc.

Some conssitency which we will have

  1. Every model still returns a dict, it contains preds or preds + metrics
  2. What will be inside preds is something specific to model, say classfication / detection. We will have some consistency in it though, but not 100%. Reason being this is what model outputs are and really it would not change. We don't want to hide these outputs from models.
  3. This allows to use custom metrics, use PyCOCO tools, or even lightning metrics. We are returning torch tensors, wrapped in dictionary, (which is what even DETR or FRCNN do).

Another alternative is that we do not compute metrics. I'm not against it or for it, but we provide utilites to compute them. I think that too can be the way.

oke-aditya avatar Dec 21 '20 10:12 oke-aditya

Ok so we can now use torchmetrics It has absolutely no requirements dependency so we can easily add it.

Also, we will use it to calculate metrics. In case any metric is not available we will keep it in quickvision.metrics

oke-aditya avatar Apr 04 '21 07:04 oke-aditya