Open-AnimateAnyone train stage 1 oom

Great work! We try to train the model of stage 1. However, we had an OOM error. The four GPUs are A100 of 80g, and the config yaml is : train_batch_size=4, sample_size: 512 # for 40G 256 sample_stride: 4, sample_n_frames: 16mixed_precision_training: False enable_xformers_memory_efficient_attention: False We run the code by torchrun --nnodes=1 --nproc_per_node=4 train_hack.py --config configs/training/train_stage_1.yaml Any idea to help solve this problem?

Jan 05 '24 12:01 maobj

I feel strange, I am at 512 resolution, single card bs=8.

Jan 05 '24 13:01 guoqincode

I feel strange, I am at 512 resolution, single card bs=8.

当我把"sample_n_frames: 16mixed_precision_training" and "enable_xformers_memory_efficient_attention"设置为True后，可以正常跑起来。但是还有个问题，在train_hack.py里设置poseguider的输出为320维（poseguider = PoseGuider(noise_latent_channels=320)），但是在推理的时候python3 -m pipelines.animation_stage_1 --config configs/prompts/animation_stage_1.yaml，加载poseguider模型时，设置的channel是4，model = PoseGuider(noise_latent_channels=4)，而且在后续的pipeline 里面使用的方式对应的也是channel为4的情况（latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) + latents_pose）。这个是为什么呢？好像是推理的时候没有用你的hackunet结构？

Jan 08 '24 08:01 maobj

I feel strange, I am at 512 resolution, single card bs=8.

当我把"sample_n_frames: 16mixed_precision_training" and "enable_xformers_memory_efficient_attention"设置为True后，可以正常跑起来。但是还有个问题，在train_hack.py里设置poseguider的输出为320维（poseguider = PoseGuider(noise_latent_channels=320)），但是在推理的时候python3 -m pipelines.animation_stage_1 --config configs/prompts/animation_stage_1.yaml，加载poseguider模型时，设置的channel是4，model = PoseGuider(noise_latent_channels=4)，而且在后续的pipeline 里面使用的方式对应的也是channel为4的情况（latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) + latents_pose）。这个是为什么呢？好像是推理的时候没有用你的hackunet结构？

你可以使用最新的demo文件夹里的gradio模式推理，更加方便

Jan 08 '24 08:01 guoqincode

@guoqincode, the batch size in the stage1 is 64 in the original paper. Even if I set enable_xformers_memory_efficient_attention=True and use 8 A100 80G, the train batch size is 32. Does this affect the effectiveness of stage 1 training?

Jan 08 '24 09:01 hkunzhe

I feel strange, I am at 512 resolution, single card bs=8.

当我把"sample_n_frames: 16mixed_precision_training" and "enable_xformers_memory_efficient_attention"设置为True后，可以正常跑起来。但是还有个问题，在train_hack.py里设置poseguider的输出为320维（poseguider = PoseGuider(noise_latent_channels=320)），但是在推理的时候python3 -m pipelines.animation_stage_1 --config configs/prompts/animation_stage_1.yaml，加载poseguider模型时，设置的channel是4，model = PoseGuider(noise_latent_channels=4)，而且在后续的pipeline 里面使用的方式对应的也是channel为4的情况（latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) + latents_pose）。这个是为什么呢？好像是推理的时候没有用你的hackunet结构？

你好我也遇到了相同的问题，请问解决了嘛

Jan 10 '24 08:01 LeonJoe13

I feel strange, I am at 512 resolution, single card bs=8.

当我把"sample_n_frames: 16mixed_precision_training" and "enable_xformers_memory_efficient_attention"设置为True后，可以正常跑起来。但是还有个问题，在train_hack.py里设置poseguider的输出为320维（poseguider = PoseGuider(noise_latent_channels=320)），但是在推理的时候python3 -m pipelines.animation_stage_1 --config configs/prompts/animation_stage_1.yaml，加载poseguider模型时，设置的channel是4，model = PoseGuider(noise_latent_channels=4)，而且在后续的pipeline 里面使用的方式对应的也是channel为4的情况（latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) + latents_pose）。这个是为什么呢？好像是推理的时候没有用你的hackunet结构？

你好我也遇到了相同的问题，请问解决了嘛

已解决，在初始化pose guider时要初始化channel为320

Jan 10 '24 11:01 LeonJoe13

Open-AnimateAnyone Open-AnimateAnyone copied to clipboard

train stage 1 oom

Open-AnimateAnyone
Open-AnimateAnyone copied to clipboard