fenghe12

Results 41 comments of fenghe12

感觉这种FF++视频中获取图片的都很难复现

maybe you can refer to their model code,at least we have that,training code seems just like animate anyone

yes,i also think reader /writer is used to implement  The image of the target character is inputted into the ReferenceNet to extract the reference feature maps outputs from the self-attention layers....

ok ,i'll take a look. we can use pretrained model,in fact Alibaba also uses pretrained model (from hugging face Stable Diffusion v1.5) reference-net and backbone inherit weights from the original SD...

yes,maybe just a combination of existing methods and modules. I'll take a look later. ------------------ 原始邮件 ------------------ 发件人: "johndpope/Emote-hack" ***@***.***>; 发送时间: 2024年3月27日(星期三) 上午10:56 ***@***.***>; ***@***.******@***.***>; 主题: Re: [johndpope/Emote-hack] take a look at this (Issue...

Have you run through the entire process? Congratulations! Let me take a look at the repo and code! ------------------ 原始邮件 ------------------ 发件人: "johndpope/Emote-hack" ***@***.***>; 发送时间: 2024年3月29日(星期五) 中午1:36 ***@***.***>; ***@***.******@***.***>; 主题: Re: [johndpope/Emote-hack] take a...

VASA adopts DiT as backbone denoising network,but it lacks more details about how to integrate conditons into dit. I attempted to replace the unet in Moore animateanyone with DIT (Latte,...

I guess DiT will be mainstream video generation architecture,because of SORA

Maybe I can offer some help > hey @fenghe12 > > i had some success with megaportraits https://github.com/johndpope/MegaPortrait-hack > > and now attempting to integrate into VASA on this branch....