echomimic icon indicating copy to clipboard operation
echomimic copied to clipboard

quality drop with non 512x512 width and height (非512x512大小的输出质量变差)

Open RockySong opened this issue 1 year ago • 3 comments

If I modified -W and -H to non-512x512, such as (384,384), (1024, 1024), (256, 256), the lip motion is damaged in different degrees. The most severe setting is under 1024x1024, whole face motion is destroyed. 我在infer_audio2vid_acc.py中,尝试把-W和-H改成256,256,或384,384,或1024,1024,都出现了不同程度的唇动消失问题。最严重的是1024x1024的,已经面目全非了。

-W 384 -H 384: 384x384: https://github.com/user-attachments/assets/b4e85c38-760a-4f10-b87a-826dd4c774d8

-W 256 -H 256: 256x256: https://github.com/user-attachments/assets/137c934b-0c62-4a89-81f1-70b15a7b54c3

-W 1024 -H 1024: 1024x1024: https://github.com/user-attachments/assets/2d02e707-51c4-4eb5-8d4b-9a0575774021

RockySong avatar Oct 03 '24 03:10 RockySong

The model is trained on 512x512 dataset.

You can upscale after the generation is complete

nitinmukesh avatar Oct 03 '24 18:10 nitinmukesh

the what does the width and the height function do? are they there for cropping the video before/after generation?

TanvirHafiz avatar Oct 26 '24 12:10 TanvirHafiz

the what does the width and the height function do? are they there for cropping the video before/after generation?

Unfortunately, I'm not a developer so can't explain any further. Someone who understand can explain.

nitinmukesh avatar Oct 26 '24 13:10 nitinmukesh