StoryDiffusion About RandSample

非常nice的工作，请问： 1）Consistent self-attention中的RandSample逻辑主要体现在哪些代码行？ 2）Sampling tokens所需的batch内的不同图片token的来源是哪块呢？

另外好像发现两处比较明显的笔误： 1）Xk, Xq, and Xv stand for the query, key, and value used in attention calculation, respectively. 2）Algorithm 1中的images_features、images_tokens不统一

May 06 '24 09:05 AlphaNext

效果确实非常nice，支持并跟进，另外基于题主的问题，还想问下：（1）paper中所描述的“given a batch of image features”，“Consistent Self-Attention samples some tokens Si from other image features in the batch...”，请问在推理阶段，batch维度怎么解释？不太明白在推理阶段，a batch of image features来自哪里？感觉和题主应该是一样的疑问，麻烦解答，感谢！

May 06 '24 11:05 winnerahao

HI, 感谢关注（1）RandSample 在这里 https://github.com/HVision-NKU/StoryDiffusion/blob/main/utils/gradio_utils.py#L258 （2）Batch维度：相当于同时生成多张图片，每张图片对应不同的prompt

May 16 '24 05:05 Z-YuPeng

@Z-YuPeng Hi，你好，我想问一下关于random sampling为什么会work，因为其他相关工作一般会通过cross-attention来得到物体mask然后refer到这些特定区域的特征，还是说更dropout的意味差不多，谢谢

Jul 23 '24 06:07 garychan22

StoryDiffusion StoryDiffusion copied to clipboard

About RandSample

StoryDiffusion
StoryDiffusion copied to clipboard