mmdetection
                                
                                
                                
                                    mmdetection copied to clipboard
                            
                            
                            
                        How to get val loss in 3.x?
I have seen Validation Loss During Training #7971. but there is no workflow in base/default_runtime.py. my mmdetection version is 3.x
Sorry, the val loss calculation in version 3.x is not yet supported. We will support it in the next few releases.
Any updates on this @RangiLyu ?
Why doesn't mmdetection 3.x include validation loss? Has it been removed for a specific reason? This is a critical feature because, without validation loss, we cannot assess whether a model is overfitting or generalizing. Does mmdetection suggest any alternative methods for addressing this? I'm feeling concerned and confused because I couldn't find anything related to this issue in the documentation.
No, the Validation Loss is not supported yet. The only way to check the overfitting is by looking at mAP scores over trained and validation data.
There is a way to get validation loss, but it's more of a hack by creating hooks in the pipeline.
Anything more on the hacking with the hooks, you can point me to?
no update yet?
No, the Validation Loss is not supported yet. The only way to check the overfitting is by looking at mAP scores over trained and validation data.
There is a way to get validation loss, but it's more of a hack by creating hooks in the pipeline.
@abdksyed could you share how to log mAP on the training dataset while training?
@tomhruby1 @willcray
@HOOKS.register_module()
class FindIoU(Hook):
    def __init__(self, name):
        os.makedirs("bestepochs", exist_ok=True)
        # Some Necessary Variables for me
        self.bestIoU = 0
        self.bestepoch = None
        self.name = name
        self.metric = BinaryJaccardIndex()
        
        # RGB format
        self.CLS2COLOR = {
            1: (228,0,120), # Red
            2: (42, 82, 190), # Blue
            3: (3, 192, 60) # Green
        }
        
        # define our custom x axis metric
        wandb.define_metric("coco/epoch")
        # define which metrics will be plotted against it
        # My OWN Custom Metrics, YOU CAN HAVE YOUR LOSS METRIC HERE
        wandb.define_metric(
          "coco/pGen1IoU", step_metric="coco/epoch", step_sync=False)
        wandb.define_metric(
          "coco/pGen2IoU", step_metric="coco/epoch", step_sync=False)
        wandb.define_metric(
          "coco/meanIoU", step_metric="coco/epoch", step_sync=False)
        
        self.artifact = wandb.Artifact(self.name, type='model')
        
    def after_val(self, runner, **kwargs):
        IoUs = []
        # TO LOAD THE MODEL FROM THE RECENT WEIGHT FILE
        checkpoint_file = runner.work_dir + f"/epoch_{runner.epoch}.pth"
        model = init_detector(runner.cfg, checkpoint_file, device='cuda:0')
        meanIoU = []
        val_file = runner.cfg.val_dataloader.dataset.ann_file
        test_file = runner.cfg.test_dataloader.dataset.ann_file
        for f_type, json_path in zip(['pGen1', 'pGen2'], [val_file, test_file]):
            
            # json_path = f"{data_type}.json"
            coco = COCO(json_path)
            img_dir = f"combined_data"
            cat_ids = coco.getCatIds()
            frames = {}
            for idx, img_data in coco.imgs.items():
                anns_ids = coco.getAnnIds(imgIds=img_data['id'], catIds=cat_ids, iscrowd=None)
                anns = coco.loadAnns(anns_ids)
                truth_mask = coco.annToMask(anns[0])
                for i in range(1,len(anns)):
                    truth_mask = np.maximum(truth_mask,coco.annToMask(anns[i])*1)
                img = f'combined_data/{img_data["file_name"]}'  # or img = mmcv.imread(img), which will only load it once
                # PERFORMING INFERENCE
                result = inference_detector(model, img)
                # outputs = predictor(im)
                pred_mask = np.zeros_like(truth_mask)
                for i in result.pred_instances.masks.type(torch.int8):
                    pred_mask = np.maximum(pred_mask, i.to('cpu').numpy().astype(np.uint8))
                    
                # frame = label2rgb(pred_mask, cv2.imread(img), alpha=0.3, bg_label=0)*255
    
                target = torch.tensor(truth_mask)
                preds = torch.tensor(pred_mask)
            
                intersection_mask = np.logical_and(pred_mask == 1, truth_mask == 1)
                pred_mask[truth_mask == 1] = 2
                pred_mask[intersection_mask] = 3
                # Repeating Channels to make it three channels
                pred_mask = np.tile(pred_mask[..., np.newaxis], (1,1,3))
                
                # red -> Wrong Predicted, blue -> Ground Truth, green -> Correct Predicted
                frame = io.imread(img)
                for color_id in range(1,4):
                    mask = np.where(pred_mask == (color_id,)*3, self.CLS2COLOR[color_id], 0).astype('uint8')
                    frame = cv2.addWeighted(frame, 1.0, mask, 0.5, 0)
                
                frames[img_data["file_name"]] = frame
                IoUs.append(self.metric(preds, target).item())
                
            
            size1,size2,_ = frame.shape
            out = cv2.VideoWriter('output.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 1, (size2, size1), True)
            # Sorting the frames according to frame number eg: p3_frame_000530..PNG
            for _,i in sorted(frames.items(), key=lambda x: x[0]):
                out_img = cv2.cvtColor(i, cv2.COLOR_BGR2RGB)
                out.write(out_img)
            out.release()
            
            # Convert MPV4 codec to libx264 codec
            input_file = 'output.mp4'
            output_file = f_type+'.mp4'
            clip = VideoFileClip(input_file)
            clip.write_videofile(output_file, codec='libx264')
            # Collect all meanIoUs for all Generalization Patients
            meanIoU.append(sum(IoUs)/len(IoUs))
            print(f"IoU: {sum(IoUs)/len(IoUs)}")
            
            # axes are (time, channel, height, width)
            wandb.log({f"{self.name}_{f_type}_epoch_{runner.epoch}": wandb.Video(output_file)})
            
        for IoU, log in zip(meanIoU, ['pGen1', 'pGen2']):
            wandb.log({f'coco/{log}':IoU, 'coco/epoch':runner.epoch})
            
        meanIoU = sum(meanIoU)/len(meanIoU)
        if meanIoU > self.bestIoU:
            self.bestIoU = meanIoU
            self.bestepoch = checkpoint_file
        print(f"meanIoU: {meanIoU}")
        wandb.log({'coco/iou':meanIoU, 'coco/epoch':runner.epoch})
        
        print(f"Saving checkpoint of epoch {runner.epoch} to wandb")
        self.artifact.add_file(checkpoint_file, name=f'epoch_{runner.epoch}.pth')
        # wandb.log_artifact(self.artifact)
    def after_run(self,runner, **kwargs):
        shutil.copy(self.bestepoch, f"bestepochs/{self.name}.pth")
        print(f"Saving best checkpoint to wandb")
        self.artifact.add_file(self.bestepoch, name=f"best.pth")
        wandb.log_artifact(self.artifact)
This was a hook which I implemented for finding IoU values after each epoch. here after_val function is something which will run after validation has been performed. after_run will be called after the entire run. There are similar functions like before_val and so on, which are mentioned in the docs (https://mmdetection.readthedocs.io/en/latest/user_guides/useful_hooks.html#how-to-implement-a-custom-hook)
I was doing inference and getting the mask of the prediction to find the IoU with the ground truth mask, and also create videos of the frames and save them in weights and biases. You can change the logic of the code, but function names and all will be same for you
There is inefficiency, like I am performing inference again on the validation/test data to get IoU whereas, while training it, by default inference is done on validation data to get mAP values and so. I couldn't find how to get results of validation which was already performed, so I had to do inference again.
@abdksyed thanks for sharing this code. This appears to be a way to compute mask IoU loss over the validation set. You mentioned that there's a way to get the mAP on the train set as well:
No, the Validation Loss is not supported yet. The only way to check the overfitting is by looking at mAP scores over trained and validation data.
There is a way to get validation loss, but it's more of a hack by creating hooks in the pipeline.
It appears that the same approach of a custom hook could be used in the docs link you provided above. Perhaps something with after_train_epoch or something similar?
@willcray
Yes, for train loss, you can use after_train_epoch and for val loss similar.
Does anyone have an example script that gets the validation loss using the hook approach?
Same question, can anyone share how to get the validation loss using the hook approach?
This would be a very useful feature and would appreciate an update on this @RangiLyu .
Any update on the feature?
Does anyone have an example script that gets the validation loss using the hook approach?
this may be useful https://github.com/open-mmlab/mmdetection/issues/11331#issuecomment-2069851373