LapDepth-release icon indicating copy to clipboard operation
LapDepth-release copied to clipboard

Image cropping & Performance on different dataset

Open DanielRoeder1 opened this issue 2 years ago • 0 comments

I am currently trying the network trained on the NYU dataset on a separate dataset (TUM RGBD desk see: https://vision.in.tum.de/data/datasets/rgbd-dataset/download#freiburg2_desk)

I would have expected to get a similar performance but the RMSE is significantly higher (TUM: 0.73 vs NYU: 0.38)

Both datasets were recorded using Microsoft Kinect Sensors and are comprised of indoor scenery.

I am trying to figure out why the model cannot generalize well to the TUM dataset.

I build my own data loading logic with follows the same steps as the one defined for the NYU dataset:

 def predict(img_path):
  img = Image.open(img_path)
  img = img.crop((40,42,616,474))
  img = np.asarray(img, dtype = np.float32) / 255
  img = img.transpose((2, 0, 1))
  img = torch.from_numpy(img)
  img = normalize(img)

  img.cuda()
  _, org_h, org_w = img.shape
  img = img.unsqueeze(0)
  img_flip = torch.flip(img,[3])

  with torch.no_grad():
      _, out = Model(img)
      _, out_flip = Model(img_flip)
      out_flip = torch.flip(out_flip,[3])
      out = 0.5*(out + out_flip)

  pred = out[0,:,:]
  return pred

Do you have any idea why the performance varies so drastically while using the same sensors and similar scenery?

Also I would appreciate if you could help me understand the different image crops in the evaluation: Prediction: pred_uncropped[42:474, 40:616] = pred

Mask for groundtruth and prediction: crop_mask[45:471, 41:601] = 1

Thank you for any help & guidance!

DanielRoeder1 avatar Jul 24 '22 13:07 DanielRoeder1