OpenPCDet icon indicating copy to clipboard operation
OpenPCDet copied to clipboard

How to get validation loss and display it in Tensorboard

Open Zixiu99 opened this issue 1 year ago • 8 comments

Screenshot from 2024-06-10 15-36-24 The current project contains only training loss and learning rate curves, how can I modify def train_one_epoch() to compute the validation loss during the training session?

Zixiu99 avatar Jun 10 '24 14:06 Zixiu99

Additionally, is there a way to modify the code so that the x-axis of the TensorBoard graphs represents epochs instead of batches?

Zixiu99 avatar Jun 10 '24 14:06 Zixiu99

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Jul 11 '24 01:07 github-actions[bot]

I would also like to ask if there's a simple solution to this.

  • Getting train loss is simple, the model function in train_utils.py/train_one_epoch() returns all of that.
  • Creating predictions each epoch is also simple by just evaluating after each epoch, as explained here: #840. This evaluation also returns other metrics like recall and aos.
  • However, I'm not able to get loss values from evaluation - the model function doesn't return it and there doesn't seem to be a built-in utility function that calculates it from the given config, ground truths, and predictions.
  • I'm currently using a custom implementation of GIoU-3D for this (I'm mainly interested in positional loss and less so in classsification loss), but that doesn't seem like the right way to go.

I'm new to object detection networks, so I'm not sure if loss is utilised in the same way as elsewhere. I want to use validation loss for hyperparameter tuning and both train and validation set losses to check for overfitting. Is it more common to use accuracy metrics PCDet provides for this here?

ReneFiala avatar Jul 24 '24 03:07 ReneFiala

Okay, I think I figured out how to get validation loss working without custom implementations and other silliness. This applies to the AnchorHead dense head, so your mileage may vary.

  1. Make sure targets are generated even during validation by getting rid of this condition or editing it to always be True. An alternative that doesn't require modifying PCDet's code (which I'm not a fan of) would be to manually call assign_targets() and edit forward_ret_dict from outside, but I haven't looked into obtaining the data_dict parameter. Maybe it's simple.
  2. Call the dense head's get_loss() function after each prediction, for instance here in this simplified bit from eval_utils.py (you still need to output it somewhere to console, Tensorboard, or a file):
losses = []
for i, batch_dict in enumerate(dataloader):
        load_data_to_gpu(batch_dict)
        with torch.no_grad():
                pred_dicts, ret_dict = model(batch_dict)
        # stuff pertaining to pred_dicts and ret_dict omitted
        losses.append(model.dense_head.get_loss()[1]['rpn_loss'])
loss = np.average(losses)

If anyone with more knowledge can chime in whether this is the right way or I'm doing something wrong, I'd be grateful. However, it appears to work.

ReneFiala avatar Aug 10 '24 20:08 ReneFiala

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Sep 10 '24 01:09 github-actions[bot]

Hi did it work for u??

jyothsna-phd22 avatar Sep 12 '24 13:09 jyothsna-phd22

Were u able to compute validation loss during training?

jyothsna-phd22 avatar Sep 12 '24 13:09 jyothsna-phd22

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Oct 13 '24 02:10 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Oct 27 '24 02:10 github-actions[bot]

Okay, I think I figured out how to get validation loss working without custom implementations and other silliness. This applies to the AnchorHead dense head, so your mileage may vary.

1. Make sure targets are generated even during validation by getting rid of [this condition](https://github.com/open-mmlab/OpenPCDet/blob/8cacccec11db6f59bf6934600c9a175dae254806/pcdet/models/dense_heads/anchor_head_single.py#L60) or editing it to always be `True`. An alternative that doesn't require modifying PCDet's code (which I'm not a fan of) would be to manually call `assign_targets()` and edit `forward_ret_dict` from outside, but I haven't looked into obtaining the `data_dict` parameter. Maybe it's simple.

2. Call the dense head's `get_loss()` function after each prediction, for instance here in this simplified bit from `eval_utils.py` (you still need to output it somewhere to console, Tensorboard, or a file):

losses = [] for i, batch_dict in enumerate(dataloader): load_data_to_gpu(batch_dict) with torch.no_grad(): pred_dicts, ret_dict = model(batch_dict) # stuff pertaining to pred_dicts and ret_dict omitted losses.append(model.dense_head.get_loss()[1]['rpn_loss']) loss = np.average(losses)

If anyone with more knowledge can chime in whether this is the right way or I'm doing something wrong, I'd be grateful. However, it appears to work.

Thank you very much for your reply, your method does work so far! I commented out the condition, and then defined a function to get the validation set loss after each training session.

def compute_val_loss(model, val_loader, logger):

    from pcdet.models import load_data_to_gpu
    training_status = model.training
    #model.eval()
    model.train()
    total_val_loss = 0
    num_batches = 0

    with torch.no_grad():
        for batch_dict in val_loader:
            load_data_to_gpu(batch_dict)
            model(batch_dict)
            loss, tb_dict, disp_dict = model.get_training_loss()
            total_val_loss += loss.item()
            num_batches += 1

    avg_val_loss = total_val_loss / max(num_batches, 1)
    logger.info(f'[compute_val_loss] val_loss = {avg_val_loss:.6f}')

    if training_status:
        model.train()

    return avg_val_loss

Currently I'm verifying that it's correct and trying to import it into tensorboard to visualize it

Zixiu99 avatar Feb 27 '25 13:02 Zixiu99