doc3D-dataset
doc3D-dataset copied to clipboard
Question about loss and training of refinement network
Hi! I have two qestions about refinement network:
-
According to the paper loss is calculated as L1 norm of difference between GT sading map and predicted shading map. Also, we can get shaded image I as hadamard product between shading free image and shading map I = A (hadamard prod) S. But there is no GT shading map in the dataset. Does it mean that you calculate ground truth S as I / A (elementwise) for training? Where I is the input image and A is corresponding albedo?
-
Refinement network consists of two UNet-like nets. The first net takes I (input image) as an input and produces N (normals), the second one takes N and produces S (shading map). But loss contains only shading map. So, technically, this is one network and I should use one optimizer for training. Am I right?
- Yes, that's how shading map is calculated. Make sure to scale the values of the map.
- The normal estimation network is pretrained with L2. While training Shading estimation we fix the weights of normal estimation.
- And how do you manage zero values? I'm a little bit concern as i have zero pixel in albedo [0,0,0] and non-zero pixel on image [0.2974836 0.41949257 0.48999533]. How can I get this if Image can be calculated as I = A dot S? [0.2974836 0.41949257 0.48999533] = [0, 0, 0] * [s1, s2, s3]
Do something like this:
shdmap=np.divide(img,alb,out=np.zeros_like(img), where=alb!=0).astype(np.float32)
And scale should look like this:
shdmap = (shdmap - shdmap.min()) / (shdmap.max() - shdmap.min())
?
We print(shdmap.max(), shdmap.min()) using your code, and the results are like that:
How should we scale the values?
@Enuvesta Hi. Did you reproduce good results?
Do something like this:
shdmap=np.divide(img,alb,out=np.zeros_like(img), where=alb!=0).astype(np.float32)
We didn't scale the values. Used ReLU as the final activation for shading map regression. However, people often use log scale.
@sagniklp Thanks for your reply.
For training the refinement network, the shape of the input image is 256256, and the output of the shading map is 128128. For inference, is the 128*128 shading map resized to the original shape of the distorted image, like the operation of the geometric rectification network.
Am I right? Thank you.
Yeah, correct!