Open-AnimateAnyone Why are ReferenceNet features used to query in reader mode?

Why are ReferenceNet features used to query in reader mode?

Open Sireer opened this issue 1 year ago • 3 comments

hidden_states_uc = self.attn1(modify_norm_hidden_states, 
encoder_hidden_states=modify_norm_hidden_states,
attention_mask=attention_mask)[:,:hidden_states.shape[-2],:] + hidden_states
 

# hidden_states_uc = self.attn1(norm_hidden_states, 
# encoder_hidden_states=torch.cat([norm_hidden_states] + self.bank, dim=1),
# attention_mask=attention_mask) + hidden_states

Why the codes from magic animate are commented? Is there any problem with this?

Jan 04 '24 03:01 Sireer

Hi, you can look at the description of spatial-attention in the paper, we just need to take the first half.

Jan 04 '24 04:01 guoqincode

Yes, but we do not need to compute the last half. It seems that the code from Magic Animate can get the first half and it does not need to comput the second half.

hidden_states_uc = self.attn1(norm_hidden_states, 
  encoder_hidden_states=torch.cat([norm_hidden_states] + self.bank, dim=1),
  attention_mask=attention_mask) + hidden_states

Jan 04 '24 07:01 Sireer

Indeed the implementation of magic animate is equivalent to the "concat then split" operation of animateanyone. The "query" don't need to be concated, concating the query and then get the first half just adds the computational overhead. (It will not affect both the training and inference process)

For trained models, inference with the "magic animate style" code also works well.

Jan 07 '24 15:01 luyvlei

Open-AnimateAnyone Open-AnimateAnyone copied to clipboard

Why are ReferenceNet features used to query in reader mode?

Open-AnimateAnyone
Open-AnimateAnyone copied to clipboard