take a look at this
https://github.com/Zejun-Yang/AniPortrait a new open source talking head generation repo.It seems it's very similar to Emote
I did see it - I'll add to readme. seems a complete rip off from MooreThread animateAnyone - Poseguider https://github.com/MooreThreads/Moore-AnimateAnyone/blob/master/train_stage_1.py#L54 Disappointing thing is - there's no training code - so all the models are locked up.
maybe you can refer to their model code,at least we have that,training code seems just like animate anyone
looking at the saved models looks less scary than this mess https://github.com/MStypulkowski/diffused-heads/issues/21
can probably just load these into simpler architecture.
I think the reader /writer is just to have the paralllel unet (to dig out the features from referencenet - reader - and throw to backbone) https://github.com/Zejun-Yang/AniPortrait/blob/main/train_stage_1.py#L53
yes,i also think reader /writer is used to implement The image of the target character is inputted into the ReferenceNet to extract the reference feature maps outputs from the self-attention layers. During the Backbone denoising procedure, the features of corresponding layers undergo a reference-attention layers with the extracted feature maps.(from paper's description) and the second link seems just like pretrained weights from Moore-Animateanyone (another talking head generation repo).Now we're also trying to implement emo,we refer to your existing repo. and I'm finishing Face Locator today.Thanks! ------------------ 原始邮件 ------------------ 发件人: "johndpope/Emote-hack" @.>; 发送时间: 2024年3月27日(星期三) 上午9:55 @.>; @.@.>; 主题: Re: [johndpope/Emote-hack] take a look at this (Issue #31)
looking at the saved models looks less scary than this mess MStypulkowski/diffused-heads#21Screenshot.from.2024-03-27.12-51-48.png (view on web)
can probably just load these into simpler architecture.
I think the reader /writer is just to have the paralllel unet (to dig out the features from referencenet - reader - and throw to backbone) https://github.com/Zejun-Yang/AniPortrait/blob/main/train_stage_1.py#L53
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
did you see this? https://github.com/johndpope/Emote-hack/issues/28 - I think we can just piggy back off the Alibaba pretrained unet model -
ok ,i'll take a look. we can use pretrained model,in fact Alibaba also uses pretrained model (from hugging face Stable Diffusion v1.5) reference-net and backbone inherit weights from the original SD UNet,only attention layers were changed
------------------ 原始邮件 ------------------ 发件人: "johndpope/Emote-hack" @.>; 发送时间: 2024年3月27日(星期三) 上午10:16 @.>; @.@.>; 主题: Re: [johndpope/Emote-hack] take a look at this (Issue #31)
did you see this? #28 - I think we can just piggy off the Alibaba pretrained unet model -
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
that's why I'm thinking it will be plug and play. Got all the models for AniPortrait - check this helper out https://github.com/xmu-xiaoma666/External-Attention-pytorch/issues/115
yes,maybe just a combination of existing methods and modules. I'll take a look later.
------------------ 原始邮件 ------------------ 发件人: "johndpope/Emote-hack" @.>; 发送时间: 2024年3月27日(星期三) 上午10:56 @.>; @.@.>; 主题: Re: [johndpope/Emote-hack] take a look at this (Issue #31)
that's why I'm thinking it will be plug and play. Got all the models for AniPortrait - check this helper out xmu-xiaoma666/External-Attention-pytorch#115
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
the aniportrait is good. I thought the ControlNetMediaPipeFace maybe best solution https://github.com/johndpope/Emote-hack/issues/23
@chris-crucible - it seems that they have enhanced the lips from the base media pipeline maybe worth retraining the model? here's the default sample. https://drive.google.com/file/d/198TWE631UX1z_YzbT31ItdF6yhSyLycL/view?usp=sharing
https://github.com/Zejun-Yang/AniPortrait/blob/bfa15742af3233c297c72b8bb5d7637c5ef5984a/src/utils/draw_util.py#L36
FACEMESH_LIPS_OUTER_BOTTOM_LEFT = [(61,146),(146,91),(91,181),(181,84),(84,17)]
FACEMESH_LIPS_OUTER_BOTTOM_RIGHT = [(17,314),(314,405),(405,321),(321,375),(375,291)]
FACEMESH_LIPS_INNER_BOTTOM_LEFT = [(78,95),(95,88),(88,178),(178,87),(87,14)]
FACEMESH_LIPS_INNER_BOTTOM_RIGHT = [(14,317),(317,402),(402,318),(318,324),(324,308)]
FACEMESH_LIPS_OUTER_TOP_LEFT = [(61,185),(185,40),(40,39),(39,37),(37,0)]
FACEMESH_LIPS_OUTER_TOP_RIGHT = [(0,267),(267,269),(269,270),(270,409),(409,291)]
FACEMESH_LIPS_INNER_TOP_LEFT = [(78,191),(191,80),(80,81),(81,82),(82,13)]
FACEMESH_LIPS_INNER_TOP_RIGHT = [(13,312),(312,311),(311,310),(310,415),(415,308)]
I get that there's still expression issues here. but result is quite good.
The head rotations maybe nice branch to get the emotion into video.
Have you run through the entire process? Congratulations! Let me take a look at the repo and code! ------------------ 原始邮件 ------------------ 发件人: "johndpope/Emote-hack" @.>; 发送时间: 2024年3月29日(星期五) 中午1:36 @.>; @.@.>; 主题: Re: [johndpope/Emote-hack] take a look at this (Issue #31)
Screenshot.from.2024-03-29.16-32-04.png (view on web)
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
python ./scripts/vid2vid.py --config ./configs/prompts/animation_facereenac.yaml -W 512 -H 512 -L 256
animation_facereenac.yaml
pretrained_base_model_path: '/media/2TB/Emote-hack/pretrained_models/StableDiffusion'
pretrained_vae_path: "stabilityai/sd-vae-ft-mse"
image_encoder_path: '/media/oem/12TB/AniPortrait/pretrained_model/image_encoder'
denoising_unet_path: "./pretrained_model/denoising_unet.pth"
reference_unet_path: "./pretrained_model/reference_unet.pth"
pose_guider_path: "./pretrained_model/pose_guider.pth"
motion_module_path: "./pretrained_model/motion_module.pth"
inference_config: "./configs/inference/inference_v2.yaml"
weight_dtype: 'fp16'
test_cases:
"./configs/inference/ref_images/lyl.png":
- '/media/2TB/Emote-hack/junk/M2Ohb0FAaJU_1.mp4'
there's no speed embedding - so the vanilla image to video will hold the face in video mostly - but because they're using the animateanyone framework - they get video2video out of the box - allowing this https://drive.google.com/file/d/1HaHPZbllOVPhbGkvV3aHLtcEew9CZGUV/view
hey @fenghe12
i had some success with megaportraits https://github.com/johndpope/MegaPortrait-hack
and now attempting to integrate into VASA on this branch. https://github.com/johndpope/VASA-1-hack/tree/MegaPortraits work on
VASA adopts DiT as backbone denoising network,but it lacks more details about how to integrate conditons into dit. I attempted to replace the unet in Moore animateanyone with DIT (Latte, a video generation model), but the results were not satisfactory. We are now trying to train a talking face video generation model based on OpenSora-plan.
I guess DiT will be mainstream video generation architecture,because of SORA
Maybe I can offer some help
hey @fenghe12
i had some success with megaportraits https://github.com/johndpope/MegaPortrait-hack
and now attempting to integrate into VASA on this branch. https://github.com/johndpope/VASA-1-hack/tree/MegaPortraits work on
i attempt to port matmulfree for llm to pytorch
https://github.com/ridgerchu/matmulfreellm
i send you a link - could be more exciting if i can get cuda code working.
sorry but i can't click your invitation link.It told me 404 error
------------------ 原始邮件 ------------------ 发件人: "John D. @.>; 发送时间: 2024年6月12日(星期三) 晚上7:16 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [johndpope/Emote-hack] take a look at this (Issue #31)
i attempt to port matmulfree for llm to pytorch
https://github.com/ridgerchu/matmulfreellm
i send you a link - could be more excited if i can get cuda code working.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
sorry that project was 3 days down the gurgler - learned how to compile cuda code.
not sure how to handle the audio stuff - wav2vec - https://github.com/johndpope/VASA-1-hack/tree/MegaPortraits
how can i help you?