LapDepth-release
LapDepth-release copied to clipboard
Image cropping & Performance on different dataset
I am currently trying the network trained on the NYU dataset on a separate dataset (TUM RGBD desk see: https://vision.in.tum.de/data/datasets/rgbd-dataset/download#freiburg2_desk)
I would have expected to get a similar performance but the RMSE is significantly higher (TUM: 0.73 vs NYU: 0.38)
Both datasets were recorded using Microsoft Kinect Sensors and are comprised of indoor scenery.
I am trying to figure out why the model cannot generalize well to the TUM dataset.
I build my own data loading logic with follows the same steps as the one defined for the NYU dataset:
def predict(img_path):
img = Image.open(img_path)
img = img.crop((40,42,616,474))
img = np.asarray(img, dtype = np.float32) / 255
img = img.transpose((2, 0, 1))
img = torch.from_numpy(img)
img = normalize(img)
img.cuda()
_, org_h, org_w = img.shape
img = img.unsqueeze(0)
img_flip = torch.flip(img,[3])
with torch.no_grad():
_, out = Model(img)
_, out_flip = Model(img_flip)
out_flip = torch.flip(out_flip,[3])
out = 0.5*(out + out_flip)
pred = out[0,:,:]
return pred
Do you have any idea why the performance varies so drastically while using the same sensors and similar scenery?
Also I would appreciate if you could help me understand the different image crops in the evaluation: Prediction: pred_uncropped[42:474, 40:616] = pred
Mask for groundtruth and prediction: crop_mask[45:471, 41:601] = 1
Thank you for any help & guidance!