eg3d Bad projection ?

Hello,

I have notice something strange in the definition of "generate_planes" and "project_onto_planes". If I have well understood, you define three transfer matrices in "generate_planes" that are used to project coordinates in "project_onto_planes" before to keep only the first two coordinates of your projection.

Problem:

B=2
N_rays=11
coordinates= torch.randn(B,N_rays,3)
planes = generate_planes()
out = project_onto_planes(planes, coordinates)

If we set P=coordinates[0][0], since out[:3,0,:] is the projection, it is supposed to return: [P[0], P[1]] [P[1], P[2]] [P[2], P[0]]

However, I got: [P[0], P[1]] [P[0], P[2]] [P[2], P[0]]

If you prefer, I have: [(X,Y), (X,Z), (Z,X)]

Reason: If I am right, I have found the reason. You defined planes by the following matrices: [[1, 0, 0],[0, 1, 0],[0, 0, 1]] [[1, 0, 0],[0, 0, 1],[0, 1, 0]] [[0, 0, 1],[1, 0, 0],[0, 1, 0]]

Let us call the matrices M1, M2 and M3. Their inverts are:

          [[1, 0, 0],
M1^{-1} = [0, 1, 0],
          [0, 0, 1]]

          [[1, 0, 0], 
M2^{-1} = [0, 0, 1],
          [0, 1, 0]]

          [[0, 1, 0],
M3^{-1} = [0, 0, 1],
          [1, 0, 0]]

If I have a point P=(X,Y,Z), I got: P @ M1^{-1} = (X,Y,Z) P @ M2^{-1} = (X,Z,Y) P @ M3^{-1} = (Z,X,Y)

Then, if I keep only the two coordinates, I have: [(X,Y), (X,Z), (Z,X)]

Possible solution: Update "generate_planes" to:

torch.tensor([[[1, 0, 0],[0, 1, 0],[0, 0, 1]],
           [[0, 1, 0],[0, 0, 1],[1, 0, 0]],
           [[0, 0, 1],[1, 0, 0],[0, 1, 0]]
           ], dtype=torch.float32)

Do not hesitate to tell me if I am misunderstanding something.

Jun 21 '22 12:06 LoickCh

I have a similar question. The inv_planes in the code is actually

tensor([[[1., 0., 0.],
         [0., 1., 0.],
         [0., 0., 1.]],

        [[1., 0., 0.],
         [0., 0., 1.],
         [0., 1., 0.]],

        [[0., 1., 0.],
         [0., 0., 1.],
         [1., 0., 0.]]
)

According to the PyTorch bmm Doc, the bmm will multiply the inv_planes on the right. If the input coordinates are [[x,y,z]], then the bmm result will be [xy, zx, zx]. However, if we multiply the inv_planes on the left (of course, you have to transpose the coordinates beforehand), the result will be [xy, xz, yz].

Not hundred percent sure.

Jun 23 '22 09:06 e4s2022

@ericryanchan also found the question about projection

Jul 13 '22 13:07 41xu

Also the same problem, and I changed the projection to [xy, xz, yz] with the following code:

projections = torch.bmm(inv_planes, torch.transpose(coordinates, 1, 2)) return torch.transpose(projections, 1, 2)[..., :2]

After that, the visual quality of the image appears to have deteriorated instead.

Both images are generated at 201kimg (Top: Official version, Bottom: Changed version)

Jul 13 '22 15:07 WeichuangLi

@WeichuangLi

Hey, I met similar deteriorated results as your top image shown. Did you strictly preprocess the FFHQ according to the given script? From my experience, if you use the original well-cropped FFHQ, the training results will be like the top image. I can confirm it is caused by the mismatch between the camera pose and face images since I've got the expected training results after re-cropping.

I think you can try to re-crop the images from FFHQ in-the-wild images first, then train the model with the changed projection code to see if it works. BTW, if you cannot process all the images (70k in total), you can choose to make up a subset, say 5k images. Please let me know If you have some updates, thank you.

Jul 14 '22 03:07 e4s2022

@WeichuangLi

Hey, I met similar deteriorated results as your top image shown. Did you strictly preprocess the FFHQ according to the given script? From my experience, if you use the original well-cropped FFHQ, the training results will be like the top image. I can confirm it is caused by the mismatch between the camera pose and face images since I've got the expected training results after re-cropping.

I think you can try to re-crop the images from FFHQ in-the-wild images first, then train the model with the changed projection code to see if it works. BTW, if you cannot process all the images (70k in total), you can choose to make up a subset, say 5k images. Please let me know If you have some updates, thank you.

Hi @bd20222 ,

Thanks for your kind advice, I used the same dataset as the official version, which is got by sending an email to Eric.

After training for a longer time, the model seems to generate much better results. I do not have many ideas about the scenario. Personally, I think it might be induced by different initializations. I also attached the generated images below for your kind perusal.

As for the projection, I think both strategies should work, as they both include different coordinates, even though the original result is [xy, xz, zx], it did include the z-coordinate. But I think the revised version might align with the strategy mentioned in the paper and my intuition.

Best regards, Weichuang

Jul 14 '22 08:07 WeichuangLi

Yuh, I can also get similar training results by following how Eric processed the dataset, and below are my generated faces ([xy, xz, xz] version):

I agree as both coordinates are contained. So the above faces you attached are generated by the revised version, i.e., [xy, xz, yz]?

Jul 14 '22 08:07 e4s2022

Yuh, I can also get similar training results by following how Eric processed the dataset, and below are my generated faces ([xy, xz, xz] version):

I agree as both coordinates are contained. So the above faces you attached are generated by the revised version, i.e., [xy, xz, yz]?

Sorry for missing out on that information. Yes, the images attached above are generated with the revised version at 2217kimg. If training for a longer time, the result might be better.

Jul 14 '22 08:07 WeichuangLi

Cool, mine is at 2400kimg, but I used a subset of FFHQ to train. (~5K training images).

Have a nice day. : )

Jul 14 '22 08:07 e4s2022

Please see the relevant post here: https://github.com/NVlabs/eg3d/issues/67

Sep 19 '22 21:09 luminohope