StoryDiffusion icon indicating copy to clipboard operation
StoryDiffusion copied to clipboard

About RandSample

Open AlphaNext opened this issue 9 months ago • 3 comments

非常nice的工作,请问: 1)Consistent self-attention中的RandSample逻辑主要体现在哪些代码行? 2)Sampling tokens所需的batch内的不同图片token的来源是哪块呢?

另外好像发现两处比较明显的笔误: 1)Xk, Xq, and Xv stand for the query, key, and value used in attention calculation, respectively. 2)Algorithm 1中的images_features、images_tokens不统一

AlphaNext avatar May 06 '24 09:05 AlphaNext

效果确实非常nice,支持并跟进,另外基于题主的问题,还想问下: (1)paper中所描述的“given a batch of image features”,“Consistent Self-Attention samples some tokens Si from other image features in the batch...”,请问在推理阶段,batch维度怎么解释?不太明白在推理阶段,a batch of image features来自哪里?感觉和题主应该是一样的疑问,麻烦解答,感谢!

winnerahao avatar May 06 '24 11:05 winnerahao

HI, 感谢关注 (1)RandSample 在这里 https://github.com/HVision-NKU/StoryDiffusion/blob/main/utils/gradio_utils.py#L258 (2)Batch维度: 相当于同时生成多张图片,每张图片对应不同的prompt

Z-YuPeng avatar May 16 '24 05:05 Z-YuPeng

@Z-YuPeng Hi,你好,我想问一下关于random sampling为什么会work,因为其他相关工作一般会通过cross-attention来得到物体mask然后refer到这些特定区域的特征,还是说更dropout的意味差不多,谢谢

garychan22 avatar Jul 23 '24 06:07 garychan22