sigma comments

Results 11 comments of


                                            sigma

Full generated ground truth for kitti raw?

> Thanks for your job. By using your script in this repository, I generate the ground truth of kitti raw in bird's eye view. However, the ground truth generated in...

Intuition of replacing PReLU

@Helios77760 thanks for your code, but when i run this script, i got this "Cannot copy param 0 weights from layer 'PReLU_1'; shape mismatch. Source param shape is (1); target...

请问使用Relu layer, Scale layer 和 ElementWise替换prelu需要重新训练吗？

谢谢,转换权重是要加载caffemodel模型,将prelu层替换掉,再重新保存吗

Maybe something wrong with the 2D annotations?

你好，这个数据的链接失效了，请问可以分享一下吗，谢谢

Question of Equation 11

Thank you for your explanation. I got it.

文本位置编码的疑问

还挺奇怪的，有两个地方没想明白： 1、为什么text的位置编码都是0呢 2、在_get_positional_embeddings中对视频添加了sincos位置编码，在CogVideoXAttnProcessor2_0中又对query和key的视频部分添加了rope位置编码，为什么要对视频加两次位置编码呢

又看了下，代码中的self.use_positional_embeddings和self.use_learned_positional_embeddings都为False，应该是没有加sincos位置编码，但还是不太明白为什么text不用加位置编码 https://github.com/huggingface/diffusers/blob/de6a88c2d7659c616b44c0856677335110b8ff2e/src/diffusers/models/embeddings.py#L736

文本位置编码的疑问

是的，我看的是CogVideoX，不是1.5的

文本位置编码的疑问

我读了下代码应该是False，可以参考下这部分 ![Image](https://github.com/user-attachments/assets/c9217c41-ac14-4151-8c76-22731e772d5e) ![Image](https://github.com/user-attachments/assets/3d07f28f-2da7-4157-b342-8e5aa966a6ff)