mmdetection icon indicating copy to clipboard operation
mmdetection copied to clipboard

Is it possible to calculate a validation loss?

Open psantiago-lsbd opened this issue 1 year ago • 0 comments

I want to conduct an experiment with an object detection model that I trained. My experiment is as follows: I want to understand a little more about the images in my test set. For this, I would like to obtain some individual metrics per image from the test dataset, in addition to getting the loss (validation) for each image. My current code is as follows:

config_file = 'swin/custom_mask_rcnn_swin-s-p4-w7_fpn_fp16_ms-crop-3x_coco.py'
checkpoint_file = 'mmdet/swin/epoch_40.pth'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

cfg = Config.fromfile(config_file)


dataset = build_dataset(cfg.data.test)

data_loader = build_dataloader(
    dataset,
    samples_per_gpu=cfg.data.samples_per_gpu,
    workers_per_gpu=cfg.data.workers_per_gpu,
    dist=False,
    shuffle=False
)

model = init_detector(config_file, checkpoint_file, device=device)

dataset = build_dataset(cfg.data.test)

model.eval()
results = []

prog_bar = mmcv.ProgressBar(len(dataset))
for i, data in enumerate(data_loader):
    with torch.no_grad():
        data = scatter(data, [device])[0]
        result = model(return_loss=True, **data)
        prog_bar.update()
        
for elem in result:
    print(elem)

I am getting a tuple of tensors as the output, like the example below:

([array([], shape=(0, 5), dtype=float32), array([[1.3915492e+02, 2.6474759e+02, 1.5779597e+02, 2.9583871e+02,
        1.4035654e-01],
       [1.3974554e+02, 2.6289932e+02, 1.6024533e+02, 3.1977335e+02,
        9.8360136e-02]], dtype=float32), array([[8.7633228e+02, 2.1958812e+02, 8.8837036e+02, 2.4044472e+02,
        7.2545749e-01]], dtype=float32), array([], shape=(0, 5), dtype=float32), array([[8.7622699e+02, 2.1944591e+02, 8.8825470e+02, 2.4135008e+02,
        2.4266671e-01]], dtype=float32)], [[], [array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]]), array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])], [array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])], [], [array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])]])
([array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([], shape=(0, 5), dtype=float32), array([[2.9585770e+02, 3.7026846e+02, 3.1038290e+02, 3.8506241e+02,
        5.9989877e-02]], dtype=float32)], [[], [], [], [], [array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])]])

My intuition tells me that this output is something like bounding box positions (x, y, w, h), confidence score per class, and also binary mask information due to the boolean values. Additionally, I added the argument "return_loss=True", and I imagine that some of this information must also be related to the loss that I want to obtain. How can I parse this output? That is, identify what each of the pieces of information in these results is to be able to find the desired loss.

I'm using MMDetection v2.28.2.

psantiago-lsbd avatar Jun 07 '24 22:06 psantiago-lsbd