UniDepth icon indicating copy to clipboard operation
UniDepth copied to clipboard

Question about class SILog

Open namrata-jangid opened this issue 6 months ago • 1 comments

Hi,

In the forward function of the class SILog, currently only the shape of tensor var_error is changed: (Refer line 29 onwards in silog.py)

        if var_error.ndim > 1:
            var_error = var_error.mean(dim=-1)

        if self.integrated > 0.0:
            scale_error = mean_error**2
            var_error = var_error + self.integrated * scale_error * (1 - si.int())

        out_loss = self.output_fn(var_error)
        return out_loss

Shouldn't we also change the shape of tensor mean_error in the exact same way?

Because if that is not done, for a batch size of 8 (and assuming num_copies=1) , the out_loss tensor has torch.Size([8,8]) instead of torch.Size([8]). Then, is the mean computed in the correct manner in class UniDepthV2?

        depth_losses = loss(
            outputs["depth"],
            target=inputs["depth"],
            mask=inputs["depth_mask"].clone(),
            si=si,
        )
        losses["opt"][loss.name] = loss.weight * depth_losses.mean()

Perhaps I am misunderstanding the computation. Requesting your help in clarifying this.

Thank you!

namrata-jangid avatar Jun 11 '25 12:06 namrata-jangid

Here is an example:

batch_size = 8 num_copies = 1 si: tensor([0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0', dtype=torch.int32) self.integrated = 0.15

Using the current method (i.e. only changing the shape of var_error):

mean_error tensor([[ 0.8700],
        [ 0.0012],
        [ 0.0655],
        [-0.0843],
        [ 0.0229],
        [-0.0147],
        [-0.0198],
        [-0.0270]], device='cuda:0', grad_fn=<SqueezeBackward2>)

var_error tensor([0.0003, 0.0109, 0.0089, 0.0024, 0.0006, 0.0069, 0.0199, 0.0033],
       device='cuda:0', grad_fn=<MeanBackward1>)

out_loss tensor([[0.3375, 0.3529, 0.3500, 0.3406, 0.3380, 0.3471, 0.3654, 0.3420],
        [0.0189, 0.1049, 0.0947, 0.0499, 0.0268, 0.0834, 0.1413, 0.0586],
        [0.0316, 0.1080, 0.0980, 0.0560, 0.0369, 0.0872, 0.1436, 0.0638],
        [0.0377, 0.1099, 0.1001, 0.0596, 0.0422, 0.0896, 0.1450, 0.0670],
        [0.0209, 0.1053, 0.0951, 0.0507, 0.0282, 0.0839, 0.1416, 0.0592],
        [0.0198, 0.1051, 0.0949, 0.0502, 0.0274, 0.0836, 0.1414, 0.0588],
        [0.0204, 0.1052, 0.0950, 0.0505, 0.0279, 0.0838, 0.1415, 0.0591],
        [0.0216, 0.1055, 0.0953, 0.0510, 0.0287, 0.0841, 0.1417, 0.0595]],
       device='cuda:0', grad_fn=<SqrtBackward0>)

out_loss.mean() = 0.10882534831762314

Notice that for each sample, the SI log error value is actually the corresponding diagonal. Therefore, when a mean of out_loss is taken, it takes into consideration unnecessary values, which may not be the intention.

Consider the below when the shape of both mean_error and var_error is changed:

mean_error tensor([ 0.8700,  0.0012,  0.0655, -0.0843,  0.0229, -0.0147, -0.0198, -0.0270],
       device='cuda:0', grad_fn=<MeanBackward1>)

var_error tensor([0.0003, 0.0109, 0.0089, 0.0024, 0.0006, 0.0069, 0.0199, 0.0033],
       device='cuda:0', grad_fn=<MeanBackward1>)

out_loss_changed tensor([0.3375, 0.1049, 0.0980, 0.0596, 0.0282, 0.0836, 0.1415, 0.0595],
       device='cuda:0', grad_fn=<SqrtBackward0>)

out_loss_changed.mean() = 0.11410681903362274

The final loss value for the batch changes significantly.

namrata-jangid avatar Jun 11 '25 14:06 namrata-jangid