EG3D-projector icon indicating copy to clipboard operation
EG3D-projector copied to clipboard

poor quality of one random image

Open maobenz opened this issue 2 years ago • 11 comments

I just try a random image and i find the quality of inversion is very poor.

Do you have any ideas about it?

image

maobenz avatar Oct 17 '22 11:10 maobenz

Did you align the input image according to https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/preprocess_in_the_wild.py ?

oneThousand1000 avatar Oct 19 '22 16:10 oneThousand1000

Did you align the input image according to https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/preprocess_in_the_wild.py ?

I am also having poor results when following the steps in the original eg3d repo (wild img following preprocessing steps with ffhq pretrained model). The arguments used are all defaults/same as the ones in README. Do you have any ideas to improve the results? Thanks!

test img:

img00000366

results:

https://user-images.githubusercontent.com/46330265/213952911-8ca9a2a8-b15a-4a5b-9d72-6b12861fdda2.mp4

luchaoqi avatar Jan 23 '23 01:01 luchaoqi

Did you align the input image according to https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/preprocess_in_the_wild.py ?

I am also having poor results when following the steps in the original eg3d repo (wild img following preprocessing steps with ffhq pretrained model). The arguments used are all defaults/same as the ones in README. Do you have any ideas to improve the results? Thanks!

test img:

img00000366

results:

img00000366_w_plus_pretrained.mp4

Hey, I think it is caused by the EG3D model itself and my simple inversion project. For the EG3D model, the performance on extreme posture is much worse than frontal, it is due to the imbalanced pose distribution in FFHQ. It is still a challenging problem. For this simple inversion project, the input image only provides information from a single view, and it is hard for PTI to generate full-view results. So I recommend you to use a better inversion method, e.g., https://github.com/jiaxinxie97/HFGI3D

oneThousand1000 avatar Jan 23 '23 02:01 oneThousand1000

Actually, this repo is just a simple implementation of the projector mentioned in EG3D, not the best choice for projecting an image into EG3D's latent space :)

oneThousand1000 avatar Jan 23 '23 02:01 oneThousand1000

Thanks! But from the official website, seems they are able to get pretty good results with single img + PTI tho.

https://user-images.githubusercontent.com/46330265/213961159-71308bd9-0ffc-4abc-b45c-8925cc5cb0d5.mp4

luchaoqi avatar Jan 23 '23 03:01 luchaoqi

Thanks! But from the official website, seems they are able to get pretty good results with single img + PTI tho.

inversion_compressed.mp4

The input image you use has ear occluded, the input images in the video contain more complete information. This repo cannot generate the regions that are occluded.

You can see the results I generated using my repo: https://github.com/NVlabs/eg3d/issues/28#issuecomment-1159512947, here is the input re-aligned image and the input camera parameters: 01457.zip 01457

oneThousand1000 avatar Jan 23 '23 03:01 oneThousand1000

Weird that I am getting a slightly different camera matrix than yours after following pytorch_3d_recon:

mine:

            [
                0.9982488751411438,
                0.01629943959414959,
                -0.056863944977521896,
                0.14564249100599475,
                0.010219544172286987,
                -0.9943544864654541,
                -0.1056165024638176,
                0.2914214260210597,
                -0.05826440826058388,
                0.10485044121742249,
                -0.9927797317504883,
                2.6802727132270365,
                0.0,
                0.0,
                0.0,
                1.0,
                4.2647,
                0.0,
                0.5,
                0.0,
                4.2647,
                0.5,
                0.0,
                0.0,
                1.0
            ]

yours:

array([ 0.99852723,  0.01640092, -0.05171374,  0.13343237,  0.01112113,
       -0.9948467 , -0.10077892,  0.27816952, -0.05310011,  0.10005538,
       -0.99356395,  2.6823157 ,  0.        ,  0.        ,  0.        ,
        1.        ,  4.2647    ,  0.        ,  0.5       ,  0.        ,
        4.2647    ,  0.5       ,  0.        ,  0.        ,  1.        ])

luchaoqi avatar Jan 24 '23 01:01 luchaoqi

Weird that I am getting a slightly different camera matrix than yours after following pytorch_3d_recon:

mine:

            [
                0.9982488751411438,
                0.01629943959414959,
                -0.056863944977521896,
                0.14564249100599475,
                0.010219544172286987,
                -0.9943544864654541,
                -0.1056165024638176,
                0.2914214260210597,
                -0.05826440826058388,
                0.10485044121742249,
                -0.9927797317504883,
                2.6802727132270365,
                0.0,
                0.0,
                0.0,
                1.0,
                4.2647,
                0.0,
                0.5,
                0.0,
                4.2647,
                0.5,
                0.0,
                0.0,
                1.0
            ]

yours:

array([ 0.99852723,  0.01640092, -0.05171374,  0.13343237,  0.01112113,
       -0.9948467 , -0.10077892,  0.27816952, -0.05310011,  0.10005538,
       -0.99356395,  2.6823157 ,  0.        ,  0.        ,  0.        ,
        1.        ,  4.2647    ,  0.        ,  0.5       ,  0.        ,
        4.2647    ,  0.5       ,  0.        ,  0.        ,  1.        ])

Yes, the matrix I uploaded is directly obtained from the dataset.json (in https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/runme.py), and the image is from the ffhq dataset. It is ok to use a slightly different matrix.

oneThousand1000 avatar Jan 24 '23 02:01 oneThousand1000

Hello, I have a follow-up question regarding your implementation here. You might have also noticed the problem here that feeding optimized latent code directly into the generator without mapping network i.e. no camera information included.

I also tried directly using optimized ws but find that it may contain some artifact regarding shalini's example:

https://user-images.githubusercontent.com/46330265/225710735-7ca05898-b49b-41c3-bb67-c463c8eb5265.mp4

There are some artifacts existing out there:

example

I went through the issue post in the original eg3d repo but didn't find any useful information. Any takeaway conclusion without including camera information in the ws from your experiments so far?

luchaoqi avatar Mar 16 '23 16:03 luchaoqi

谢谢!但是从官方网站来看,他们似乎能够通过单个 img + PTI 获得相当不错的结果。 inversion_compressed.mp4

您使用的输入图像有耳朵遮挡,视频中的输入图像包含更完整的信息。此 repo 无法生成被遮挡的区域。

你可以看到我使用我的 repo 生成的结果:NVlabs/eg3d#28 (comment), 这里是输入重新对齐的图像和输入相机参数: 01457.zip 01457

Why does the regenerated image look different from the original image. It should be noted that this image is not from the ffhq dataset

regenerated image

400

source image

00001

pfeducode avatar Apr 05 '23 14:04 pfeducode

谢谢!但是从官方网站来看,他们似乎能够通过单个 img + PTI 获得相当不错的结果。 inversion_compressed.mp4

您使用的输入图像有耳朵遮挡,视频中的输入图像包含更完整的信息。此 repo 无法生成被遮挡的区域。 你可以看到我使用我的 repo 生成的结果:NVlabs/eg3d#28 (comment), 这里是输入重新对齐的图像和输入相机参数: 01457.zip 01457

Why does the regenerated image look different from the original image. It should be noted that this image is not from the ffhq dataset

regenerated image

400

source image

00001

What do you mean by saying "regenerated image looks different from the original image"? If you mean that the regenerated image can not catch some fine-level details in the original image, it is caused by the expression power of the adversarial generative network. If you want to preserve the details, you can try to use https://github.com/jiaxinxie97/HFGI3D, which can achieve better performance than my simple projector implementation.

oneThousand1000 avatar Apr 05 '23 14:04 oneThousand1000