mmdetection How to get val loss in 3.x?

I have seen Validation Loss During Training #7971. but there is no workflow in base/default_runtime.py. my mmdetection version is 3.x

Mar 08 '23 12:03 devin-ry

Sorry, the val loss calculation in version 3.x is not yet supported. We will support it in the next few releases.

Mar 09 '23 10:03 RangiLyu

Any updates on this @RangiLyu ?

Apr 12 '23 11:04 abdksyed

Why doesn't mmdetection 3.x include validation loss? Has it been removed for a specific reason? This is a critical feature because, without validation loss, we cannot assess whether a model is overfitting or generalizing. Does mmdetection suggest any alternative methods for addressing this? I'm feeling concerned and confused because I couldn't find anything related to this issue in the documentation.

Sep 17 '23 18:09 thanujan96

No, the Validation Loss is not supported yet. The only way to check the overfitting is by looking at mAP scores over trained and validation data.

There is a way to get validation loss, but it's more of a hack by creating hooks in the pipeline.

Sep 18 '23 16:09 abdksyed

Anything more on the hacking with the hooks, you can point me to?

Sep 19 '23 08:09 tomhruby1

no update yet?

Nov 28 '23 00:11 Vaaaaaalllll

No, the Validation Loss is not supported yet. The only way to check the overfitting is by looking at mAP scores over trained and validation data.

There is a way to get validation loss, but it's more of a hack by creating hooks in the pipeline.

@abdksyed could you share how to log mAP on the training dataset while training?

Dec 16 '23 00:12 willcray

@tomhruby1 @willcray

@HOOKS.register_module()
class FindIoU(Hook):
    def __init__(self, name):
        os.makedirs("bestepochs", exist_ok=True)
        # Some Necessary Variables for me
        self.bestIoU = 0
        self.bestepoch = None
        self.name = name
        self.metric = BinaryJaccardIndex()
        
        # RGB format
        self.CLS2COLOR = {
            1: (228,0,120), # Red
            2: (42, 82, 190), # Blue
            3: (3, 192, 60) # Green
        }
        
        # define our custom x axis metric
        wandb.define_metric("coco/epoch")
        # define which metrics will be plotted against it
        # My OWN Custom Metrics, YOU CAN HAVE YOUR LOSS METRIC HERE
        wandb.define_metric(
          "coco/pGen1IoU", step_metric="coco/epoch", step_sync=False)
        wandb.define_metric(
          "coco/pGen2IoU", step_metric="coco/epoch", step_sync=False)
        wandb.define_metric(
          "coco/meanIoU", step_metric="coco/epoch", step_sync=False)
        
        self.artifact = wandb.Artifact(self.name, type='model')
        
    def after_val(self, runner, **kwargs):
        IoUs = []
        # TO LOAD THE MODEL FROM THE RECENT WEIGHT FILE
        checkpoint_file = runner.work_dir + f"/epoch_{runner.epoch}.pth"
        model = init_detector(runner.cfg, checkpoint_file, device='cuda:0')
        meanIoU = []
        val_file = runner.cfg.val_dataloader.dataset.ann_file
        test_file = runner.cfg.test_dataloader.dataset.ann_file
        for f_type, json_path in zip(['pGen1', 'pGen2'], [val_file, test_file]):
            
            # json_path = f"{data_type}.json"
            coco = COCO(json_path)
            img_dir = f"combined_data"
            cat_ids = coco.getCatIds()
            frames = {}
            for idx, img_data in coco.imgs.items():
                anns_ids = coco.getAnnIds(imgIds=img_data['id'], catIds=cat_ids, iscrowd=None)
                anns = coco.loadAnns(anns_ids)

                truth_mask = coco.annToMask(anns[0])
                for i in range(1,len(anns)):
                    truth_mask = np.maximum(truth_mask,coco.annToMask(anns[i])*1)

                img = f'combined_data/{img_data["file_name"]}'  # or img = mmcv.imread(img), which will only load it once
                # PERFORMING INFERENCE
                result = inference_detector(model, img)
                # outputs = predictor(im)

                pred_mask = np.zeros_like(truth_mask)
                for i in result.pred_instances.masks.type(torch.int8):
                    pred_mask = np.maximum(pred_mask, i.to('cpu').numpy().astype(np.uint8))
                    
                # frame = label2rgb(pred_mask, cv2.imread(img), alpha=0.3, bg_label=0)*255
    
                target = torch.tensor(truth_mask)
                preds = torch.tensor(pred_mask)
            
                intersection_mask = np.logical_and(pred_mask == 1, truth_mask == 1)
                pred_mask[truth_mask == 1] = 2
                pred_mask[intersection_mask] = 3
                # Repeating Channels to make it three channels
                pred_mask = np.tile(pred_mask[..., np.newaxis], (1,1,3))
                
                # red -> Wrong Predicted, blue -> Ground Truth, green -> Correct Predicted
                frame = io.imread(img)
                for color_id in range(1,4):
                    mask = np.where(pred_mask == (color_id,)*3, self.CLS2COLOR[color_id], 0).astype('uint8')
                    frame = cv2.addWeighted(frame, 1.0, mask, 0.5, 0)
                
                frames[img_data["file_name"]] = frame

                IoUs.append(self.metric(preds, target).item())
                
            
            size1,size2,_ = frame.shape
            out = cv2.VideoWriter('output.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 1, (size2, size1), True)
            # Sorting the frames according to frame number eg: p3_frame_000530..PNG
            for _,i in sorted(frames.items(), key=lambda x: x[0]):
                out_img = cv2.cvtColor(i, cv2.COLOR_BGR2RGB)
                out.write(out_img)
            out.release()
            
            # Convert MPV4 codec to libx264 codec
            input_file = 'output.mp4'
            output_file = f_type+'.mp4'
            clip = VideoFileClip(input_file)
            clip.write_videofile(output_file, codec='libx264')

            # Collect all meanIoUs for all Generalization Patients
            meanIoU.append(sum(IoUs)/len(IoUs))
            print(f"IoU: {sum(IoUs)/len(IoUs)}")
            
            # axes are (time, channel, height, width)
            wandb.log({f"{self.name}_{f_type}_epoch_{runner.epoch}": wandb.Video(output_file)})
            
        for IoU, log in zip(meanIoU, ['pGen1', 'pGen2']):
            wandb.log({f'coco/{log}':IoU, 'coco/epoch':runner.epoch})
            
        meanIoU = sum(meanIoU)/len(meanIoU)
        if meanIoU > self.bestIoU:
            self.bestIoU = meanIoU
            self.bestepoch = checkpoint_file

        print(f"meanIoU: {meanIoU}")
        wandb.log({'coco/iou':meanIoU, 'coco/epoch':runner.epoch})
        
        print(f"Saving checkpoint of epoch {runner.epoch} to wandb")
        self.artifact.add_file(checkpoint_file, name=f'epoch_{runner.epoch}.pth')
        # wandb.log_artifact(self.artifact)
    def after_run(self,runner, **kwargs):
        shutil.copy(self.bestepoch, f"bestepochs/{self.name}.pth")
        print(f"Saving best checkpoint to wandb")
        self.artifact.add_file(self.bestepoch, name=f"best.pth")
        wandb.log_artifact(self.artifact)

This was a hook which I implemented for finding IoU values after each epoch. here after_val function is something which will run after validation has been performed. after_run will be called after the entire run. There are similar functions like before_val and so on, which are mentioned in the docs (https://mmdetection.readthedocs.io/en/latest/user_guides/useful_hooks.html#how-to-implement-a-custom-hook)

I was doing inference and getting the mask of the prediction to find the IoU with the ground truth mask, and also create videos of the frames and save them in weights and biases. You can change the logic of the code, but function names and all will be same for you

There is inefficiency, like I am performing inference again on the validation/test data to get IoU whereas, while training it, by default inference is done on validation data to get mAP values and so. I couldn't find how to get results of validation which was already performed, so I had to do inference again.

Dec 16 '23 02:12 abdksyed

@abdksyed thanks for sharing this code. This appears to be a way to compute mask IoU loss over the validation set. You mentioned that there's a way to get the mAP on the train set as well:

No, the Validation Loss is not supported yet. The only way to check the overfitting is by looking at mAP scores over trained and validation data.

There is a way to get validation loss, but it's more of a hack by creating hooks in the pipeline.

It appears that the same approach of a custom hook could be used in the docs link you provided above. Perhaps something with after_train_epoch or something similar?

Dec 18 '23 17:12 willcray

@willcray

Yes, for train loss, you can use after_train_epoch and for val loss similar.

Dec 19 '23 22:12 abdksyed

Does anyone have an example script that gets the validation loss using the hook approach?

Jan 09 '24 11:01 tmargary

Same question, can anyone share how to get the validation loss using the hook approach?

Jan 22 '24 08:01 Cindy0725

This would be a very useful feature and would appreciate an update on this @RangiLyu .

Jan 22 '24 09:01 EmmaMeeus

Any update on the feature?

Apr 04 '24 22:04 Roger-F

Does anyone have an example script that gets the validation loss using the hook approach?

this may be useful https://github.com/open-mmlab/mmdetection/issues/11331#issuecomment-2069851373

Apr 22 '24 15:04 g824718114

mmdetection mmdetection copied to clipboard

How to get val loss in 3.x?

mmdetection
mmdetection copied to clipboard