eg3d Details in EG3D Inversion

I released my EG3D inversion code for your reference, you can find it here: EG3D-projector.

Thanks for the impressive work!

As you mentioned in the paper, you use Pivotal Tuning Inversion to invert test images. The PTI finetunes the EG3D parameters based on the pivot latent code, which is obtained by optimization. The pivot latent code is "w" or "w+", however, they are correlated with camera parameters that are fed to the mapping network. Will the novel-view synthesis be affected by this camera-fixed latent code?

I also noticed that you set a hyper-parameter entangle = 'camera' in gen_videos.py, it seems that you have considered this issue when rendering different views for a specific latent code. I tried the 'condition' and 'both', the camera parameters that are input to the mapping only control some unrelated semantic attribute (expression, clothes...). I think maybe the [zs,c] that is fed to the mapping network can be regarded as a latent code with a shape of [1,512+25], does the c influence the camera view of subsequent synthesis?

Now I have reproduced the PTI inversion of EG3D. Please see the video below. I input the re-aligned 00000.png and its camera parameters (from the dataset.json), then I optimize the latent code 'w' and use it as the pivot to finetune eg3d.

The result looks a little strange. I want to know if my implementation is consistent with yours!

I think I've figured out why the camera parameters that are input to the mapping network can't control the camera view, please refer to Generator Pose Conditioning.

https://user-images.githubusercontent.com/32099648/173868809-4cd3fc8b-774b-4068-865e-358f60e19411.mp4

Jun 15 '22 15:06 oneThousand1000

Is it possible to share the code for the PTI inversion?

Jun 16 '22 10:06 cantonioupao

What's about your results for the first step (inversion), do you just use lpips loss, following PTI ?

Jun 18 '22 16:06 zhangqianhui

What's about your results for the first step (inversion), do you just use lpips loss, following PTI ?

Yes, my code is based on the w projector of PTI. It seems that the inversion works best on portraits that look straight ahead. I think I achieved the same performance as the authors'.

https://user-images.githubusercontent.com/32099648/174448309-4196294d-5263-4abf-a7ed-4a1e32ade48d.MOV

Jun 18 '22 16:06 oneThousand1000

Ok, great!

Jun 19 '22 06:06 zhangqianhui

What's about your results for the first step (inversion), do you just use lpips loss, following PTI ?

Yes, my code is based on the w projector of PTI. It seems that the inversion works best on portraits that look straight ahead. I think I achieved the same performance as the authors'.

IMG_0604.MOV

Hi,

I try to inverse this portrait, but is seems that it can't invert a correct eye glasses shape, this can also produce reseanable result in input view. I want to ask if it is because you get the eye glasses after the pivot optimization(which I can't achieve), then this shape will perserve on generator optimization? Or you add other regularization?

Thank your for your time!

https://user-images.githubusercontent.com/46376580/174758112-52a91f58-3c74-4a2c-8d5a-e4a7c94aedfe.mp4

Jun 21 '22 08:06 jiaxinxie97

@jiaxinxie97 Hi jiaxinxie97, I use the original eg3d checkpoints to generate video for the latent code, it seems that the eye glasses is reconstructed successfully, which indicates that I got the eye glasses before the pivot optimization.

I think you can check your projector code, I used this one (both w and w_plus are OK). The zip file I uploaded contains the input re-aligned image and the input camera parameters, you can check if they are consistent with yours.

video generated by the original eg3d checkpoints https://user-images.githubusercontent.com/32099648/174772757-d316bc1d-de52-49a4-a863-6de166000450.mp4

input re-aligned image and the input camera parameters: 01457.zip

Jun 21 '22 09:06 oneThousand1000

Thanks! I also use PTI repo, but it is strange I can't reconstruct eye glasses using w or w+ space optimization, I will check! Since the original eg3d checkpoint do not have named_buffers(), so I removed the reg_loss, will it affect the results?

Jun 21 '22 10:06 jiaxinxie97

Thanks! I also use PTI repo, but it is strange I can't reconstruct eye glasses using w or w+ space optimization, I will check! Since the original eg3d checkpoint do not have named_buffers(), so I removed the reg_loss, will it affect the results?

Hi, named_buffers() is an attribute of the synthesis network in StyleGAN2, you can find it in the StyleGAN2Backbone (self.backbone) of the TriPlaneGenerator.

Try to use G.backbone.synthesis.named_buffers() instead of G.named_buffers(), and add the reg_loss.

Jun 21 '22 10:06 oneThousand1000

G.backbone.synthesis.named_buffers() Hi, Thank you! I got reasonable result for the eye glasses.

Jun 21 '22 22:06 jiaxinxie97

Hi, @oneThousand1000

Did you set both the z and c as the trainable parameters during the GAN inversion? I guess fixing the c (which can be obtained from the dataset.json) and only inverting the z is more reasonable. What do you think?

Jun 22 '22 06:06 e4s2022

Hi, @oneThousand1000

Did you set both the z and c as the trainable parameters during the GAN inversion? I guess fixing the c (which can be obtained from the dataset.json) and only inverting the z is more reasonable. What do you think?

I set w or w_plus as trainable parameters and fix the c.

Jun 22 '22 06:06 oneThousand1000

Got it, thanks for your reply.

BTW, did you follow the FFHQ preprocessing steps in EG3D (i.e., realign to 1500 from in-the-wild images and then resize into 512), or directly use the well-aligned 1024 FFHQ image and just resize into 512?

Jun 22 '22 06:06 e4s2022

Hi @oneThousand1000,

Do you have any out of domain results? I tried PTI by myself on FFHQ checkpoint, it works well on joker, but failed on celeba-HQ dataset. celeba_out joker_out

Jun 23 '22 03:06 mlnyang

follow the FFHQ preprocessing steps in EG3D

Got it, thanks for your reply.

BTW, did you follow the FFHQ preprocessing steps in EG3D (i.e., realign to 1500 from in-the-wild images and then resize into 512), or directly use the well-aligned 1024 FFHQ image and just resize into 512?

I followed the FFHQ preprocessing steps in EG3D.

Jun 23 '22 03:06 oneThousand1000

@mlnyang, I got the similar results as yours.

I use the well-aligned & cropped FFHQ images (in 1024 resolution), then I resize into 512 to do the subsequent PTI inversion. To be more specific, say, I choose the "00999.png" as the input. Since the camera parameters (25 = 4x4 + 3x3) are provided in dataset.json, so I directly use it. The camara parameters are fixed while the w latent code is trainable. The following are my results:

However, when I follow the FFHQ preprocessing steps in EG3D which basically contain (1) aligning & cropping in-the-wild image to 1500 size; (2) re-aligning to 1024 & center cropping to 700; (3) resizing to 512, the results seem good:

I guess the different underlying preprocessings might be the reason. When you tried PTI on joker, how did you preprocess?

Jun 23 '22 03:06 e4s2022

Hi @bd20222 , Thanks for sharing your work.

I think that's the main reason. Actually, the joker is originally came from PTI repo, maybe it was already preprocessed joker image on FFHQ.

Jun 23 '22 04:06 mlnyang

I took a look at the PTI aligning script, it seems the same as the original FFHQ.

I inspected the EG3D preprocessing and compared it with the original FFHQ. AFAIK, there is no center cropping step in the original FFHQ preprocessing, so you will find the faces used in EG3D show some vertical translation. I guess the well-trained EG3D model has captured this pattern, resulting in the blurry PTI inversion and the subsequent synthetic novel views seem to be a mixture of two faces.

It's interesting why the joker example works well.

Jun 23 '22 04:06 e4s2022

Hi, please follow the "Preparing datasets" in reademe to get realigned images. According to https://github.com/NVlabs/eg3d/issues/16#issuecomment-1151563364, the original ffhq dataset is not work for the camera parameters of dataset.json, you should predict the camera parameters of original ffhq by yourself.

Jun 23 '22 04:06 oneThousand1000

@oneThousand1000

Yuh, I agree. For those who want to directly use FFHQ well-aligned 1024 images, you have to predict the camera parameters by Deep3DFace_pytorch by yourself. But I haven't tested on the EG3D pre-trained model.

Jun 23 '22 04:06 e4s2022

@oneThousand1000

Yuh, I agree. For those who want to directly use FFHQ well-aligned 1024 images, you have to predict the camera parameters by Deep3DFace_pytorch by yourself.

You can email the author and ask for the pose extraction code. Or refer to https://github.com/NVlabs/eg3d/issues/18

Jun 23 '22 04:06 oneThousand1000

Oh I see.. center cropping was the problem. I just tried other examples in PTI and it didn't worked. It is strange why the joker image works well. Thanks for your help!! :)

Jun 23 '22 04:06 mlnyang

@oneThousand1000 Do you use the noise regularization loss in the first GAN inversion step ?

Jun 23 '22 08:06 zhangqianhui

@oneThousand1000 Do you use the noise regularization loss in the first GAN inversion step ?

See https://github.com/NVlabs/eg3d/issues/28#issuecomment-1161560077

Jun 23 '22 08:06 oneThousand1000

Thanks

Jun 23 '22 10:06 zhangqianhui

@oneThousand1000 I still have one question. Will be tuned for the parameters of triplane-decoder ? I used another 3D-GAN model (StyleSDF) which doesn't have the triplane generator, and I found the finetune of MLP parameters have harmed the geometry.

Jun 24 '22 05:06 zhangqianhui

Hi, @oneThousand1000,

I tried to use PTI to get pivot of an image, then in the gen_video.py file, I used the pivot to set zs, which original is set by random seeds: "zs = torch.from_numpy(np.stack([np.random.RandomState(seed).randn(G.z_dim) for seed in all_seeds])).to(device)". I got the video, but the image is totally different with the previous. For the connection between PTI and EG3D, did I miss anything? I noticed you said " I optimize the latent code 'w' and use it as the pivot to finetune eg3d." Do you mean we need generate the dataset and call "train.py" to finetune? If we have done finetune, why do we need PTI? I thought PTI is to get the latent as conditioning for EG3D. Thanks for your help.

Jun 29 '22 05:06 BiboGao

Hi, @oneThousand1000,

I tried to use PTI to get pivot of an image, then in the gen_video.py file, I used the pivot to set zs, which original is set by random seeds: "zs = torch.from_numpy(np.stack([np.random.RandomState(seed).randn(G.z_dim) for seed in all_seeds])).to(device)". I got the video, but the image is totally different with the previous. For the connection between PTI and EG3D, did I miss anything? I noticed you said " I optimize the latent code 'w' and use it as the pivot to finetune eg3d." Do you mean we need generate the dataset and call "train.py" to finetune? If we have done finetune, why do we need PTI? I thought PTI is to get the latent as conditioning for EG3D. Thanks for your help.

Hi, you need to feed the zs into mapping network of eg3d, get the w or ws latent code, then optimize the w or ws. Please refer to https://github.com/danielroich/PTI/tree/main/training/projectors or the paper of stylegan for the w/ws latent code definition.

Jun 29 '22 06:06 oneThousand1000

I got it, thanks.

Jun 29 '22 20:06 BiboGao

Thanks for your help, I also have got realistic results

Jun 30 '22 08:06 zhangqianhui

@bd20222, hello! Which dataset.json do you use? I use ffhq-dataset-v2.json, but there's no camera parameters.

Jul 04 '22 04:07 lyx0208

@lyx0208, hi, you have to preprocess the dataset in advance. The details can be found here. For your question, the camera parameters provided by the author can be downloaded from https://github.com/NVlabs/eg3d/blob/71ef469df0095c609b2b151127774ea74a1bf17c/dataset_preprocessing/ffhq/runme.py#L48-L50

Jul 04 '22 04:07 e4s2022

@bd20222, get it, thanks!

Jul 04 '22 09:07 lyx0208

FYI We added additional scripts that can preprocess in-the-wild images compatible with the FFHQ checkpoints. Hope that is useful. https://github.com/NVlabs/eg3d/issues/18#issuecomment-1200366872

Jul 31 '22 07:07 luminohope

FYI We added additional scripts that can preprocess in-the-wild images compatible with the FFHQ checkpoints. Hope that is useful. #18 (comment)

Hi! I found that all the faces in FFHQ Processed Data (download from the google drive link that you provided) are rotated so that two eyes are on a horizontal line. But the uploaded scripts seems to do no rotation. Does this matter?

The first image I uploaded is the image in FFHQ Processed Data, I processed the raw image of 00000 using your uploaded scripts and got the second image.

I also find that the uploaded scripts outputs camera parameters that are different from dataset.json. Maybe it is caused by the missing rotation?

The camera parameter that predicted by uploaded scripts: [ 0.944381833076477, -0.011193417012691498, 0.32866042852401733, -0.828210463398311, -0.010220649652183056, -0.9999367594718933, -0.004687247332185507, 0.005099154064645238, 0.32869213819503784, 0.0010674281511455774, -0.9444364905357361, 2.5698329570120664, 0.0, 0.0, 0.0, 1.0, 4.2647, 0.0, 0.5, 0.0, 4.2647, 0.5, 0.0, 0.0, 1.0 ] The camera parameter in dataset.json: [ 0.9422833919525146, 0.034289587289094925, 0.3330560326576233, -0.8367999667889383, 0.03984849900007248, -0.9991570711135864, -0.009871904738247395, 0.017018394869192363, 0.33243677020072937, 0.022573914378881454, -0.9428553581237793, 2.566997504832856, 0.0, 0.0, 0.0, 1.0, 4.2647, 0.0, 0.5, 0.0, 4.2647, 0.5, 0.0, 0.0, 1.0 ]

00000

Jul 31 '22 08:07 oneThousand1000

Hi! I think the [image align code](https://github.com/Puzer/stylegan-encoder/blob/master/align_images.py) that provided by [stylegan-encoder](https://github.com/Puzer/stylegan-encoder) may be useful to rotate the image, if you want to get the similar re-aligned image in FFHQ Processed Data.

Maybe for in-the-wild images, the rotation is an unnecessary step.

I modified the code, used it to rotate and crop the image, it seems that after rotation the resulting image is consistent with the one in FFHQ Processed Data.

Jul 31 '22 09:07 oneThousand1000

eg3d eg3d copied to clipboard

Details in EG3D Inversion

eg3d
eg3d copied to clipboard