Arka Sadhu
                                            Arka Sadhu
                                        
                                    @pinkfloyd06 As I earlier mentioned, the author used `graph_max_pool(x, 4)`. So the number of nodes are divided by a factor of 4. In the computation of the laplacian matrices, the...
@zmykevin Thanks for the pointer. I came across the repository, but couldn't figure out which model to use. Could you clarify how you generate the weakly aligned sentences? Again, thanks...
@zmykevin Thanks for the reply. How do you perform the retrieval for large number image and text sets? Do you have any particular implementation?
@zmykevin Thanks for the reply. Do you happen to have it implemented somewhere (I couldn't find it in the repo)? Did you use the normal FlatIP (i.e. normal dot product)...
@iejMac Thanks for your reply. My current case is I have a large number of video files, I am using webvid-2M dataset (https://m-bain.github.io/webvid-dataset/). There are around 2M videos each with...
@iejMac I believe even with larger time horizon videos, having access to full videos is often useful. For instance you can sample the frames with a bit of temporal jittering...
@iejMac I agree that for CLIP it is likely not that useful. But I am not exactly training CLIP, but trying a different model. Video models such as SlowFast often...
@ttx213 sorry for the delayed reply. Could you clarify what command you used?
@zhangheng0311 have you checked https://github.com/TheShadow29/VidSitu/issues/3 and https://github.com/TheShadow29/VidSitu/issues/5 ?
@lan-lw The log files have `cmd` and `cmd_str` which refers to the command used. Let me know if it works out.