Open-AnimateAnyone
Open-AnimateAnyone copied to clipboard
train stage 1 oom
Great work! We try to train the model of stage 1. However, we had an OOM error. The four GPUs are A100 of 80g, and the config yaml is : train_batch_size=4, sample_size: 512 # for 40G 256 sample_stride: 4, sample_n_frames: 16mixed_precision_training: False enable_xformers_memory_efficient_attention: False We run the code by torchrun --nnodes=1 --nproc_per_node=4 train_hack.py --config configs/training/train_stage_1.yaml Any idea to help solve this problem?
I feel strange, I am at 512 resolution, single card bs=8.
I feel strange, I am at 512 resolution, single card bs=8.
当我把"sample_n_frames: 16mixed_precision_training" and "enable_xformers_memory_efficient_attention"设置为True后,可以正常跑起来。但是还有个问题,在train_hack.py里设置poseguider的输出为320维(poseguider = PoseGuider(noise_latent_channels=320)),但是在推理的时候python3 -m pipelines.animation_stage_1 --config configs/prompts/animation_stage_1.yaml,加载poseguider模型时,设置的channel是4,model = PoseGuider(noise_latent_channels=4),而且在后续的pipeline 里面使用的方式对应的也是channel为4的情况(latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) + latents_pose)。这个是为什么呢?好像是推理的时候没有用你的hackunet结构?
I feel strange, I am at 512 resolution, single card bs=8.
当我把"sample_n_frames: 16mixed_precision_training" and "enable_xformers_memory_efficient_attention"设置为True后,可以正常跑起来。但是还有个问题,在train_hack.py里设置poseguider的输出为320维(poseguider = PoseGuider(noise_latent_channels=320)),但是在推理的时候python3 -m pipelines.animation_stage_1 --config configs/prompts/animation_stage_1.yaml,加载poseguider模型时,设置的channel是4,model = PoseGuider(noise_latent_channels=4),而且在后续的pipeline 里面使用的方式对应的也是channel为4的情况(latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) + latents_pose)。这个是为什么呢?好像是推理的时候没有用你的hackunet结构?
你可以使用最新的demo文件夹里的gradio模式推理,更加方便
@guoqincode, the batch size in the stage1 is 64 in the original paper. Even if I set enable_xformers_memory_efficient_attention=True and use 8 A100 80G, the train batch size is 32. Does this affect the effectiveness of stage 1 training?
I feel strange, I am at 512 resolution, single card bs=8.
当我把"sample_n_frames: 16mixed_precision_training" and "enable_xformers_memory_efficient_attention"设置为True后,可以正常跑起来。但是还有个问题,在train_hack.py里设置poseguider的输出为320维(poseguider = PoseGuider(noise_latent_channels=320)),但是在推理的时候python3 -m pipelines.animation_stage_1 --config configs/prompts/animation_stage_1.yaml,加载poseguider模型时,设置的channel是4,model = PoseGuider(noise_latent_channels=4),而且在后续的pipeline 里面使用的方式对应的也是channel为4的情况(latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) + latents_pose)。这个是为什么呢?好像是推理的时候没有用你的hackunet结构?
你好我也遇到了相同的问题,请问解决了嘛
I feel strange, I am at 512 resolution, single card bs=8.
当我把"sample_n_frames: 16mixed_precision_training" and "enable_xformers_memory_efficient_attention"设置为True后,可以正常跑起来。但是还有个问题,在train_hack.py里设置poseguider的输出为320维(poseguider = PoseGuider(noise_latent_channels=320)),但是在推理的时候python3 -m pipelines.animation_stage_1 --config configs/prompts/animation_stage_1.yaml,加载poseguider模型时,设置的channel是4,model = PoseGuider(noise_latent_channels=4),而且在后续的pipeline 里面使用的方式对应的也是channel为4的情况(latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) + latents_pose)。这个是为什么呢?好像是推理的时候没有用你的hackunet结构?
你好我也遇到了相同的问题,请问解决了嘛
已解决,在初始化pose guider时要初始化channel为320