DiffTalk How can the driven-audio feature a and the landmark representation l be used for cross-attention module?

How can the driven-audio feature a and the landmark representation l be used for cross-attention module?

Open Haoqing-Wang opened this issue 1 year ago • 1 comments

As we all know, the driven-audio feature a and the landmark representation l are just a vector, not a batch of vectors, so how can they be used in cross-attention module as Key and Value?

Aug 21 '23 09:08 Haoqing-Wang

DiffTalk DiffTalk copied to clipboard

How can the driven-audio feature a and the landmark representation l be used for cross-attention module?

DiffTalk
DiffTalk copied to clipboard