Moore-AnimateAnyone
Moore-AnimateAnyone copied to clipboard
What are the differences from animate magic?
Great work, but I have some questions: From the code, it seems that the way refnet is connected to unet is similar to magic, but the anyone paper doesn't seem to be like this. Also, the handling of spatial attention is the same as in magic. Or did I misunderstand something? Please help clarify.
I have the same question. In the paper the outputs from Ref_Net and UNet are concat in the width dimension, but the code: bank_fea = [ rearrange( d.unsqueeze(1).repeat(1, video_length, 1, 1), "b t l c -> (b t) l c", ) for d in self.bank ] modify_norm_hidden_states = torch.cat( [norm_hidden_states] + bank_fea, dim=1 )