Yi Wang
Yi Wang
Sorry for the late reply. Need time for CVPR submission.
viclip_text.py has been updated, while the viclip_vision.py file appears to be functioning properly. We encourage you to test them and report any further issues you encounter.
That is a module to load videos on our servers. It may not be applicable in your case. You can remove it and update the corresponding video loading functions. We...
Good idea. We are working on it. Hope to add this feature soon.
We will update it later. For [inpaint-attnorm](https://github.com/shepnerd/AttenNorm/tree/master/inpaint-attnorm), its usage can be partially referred to [here](https://github.com/shepnerd/inpainting_gmcnn).
For the inpainting application, we do not use the affine parameters for denormalization. The final inpainting effects in the paper are without affine parameters.
We require an internal discussion to make a decision. We will provide you with an update shortly.
提供的示例是个极端难的样例。这些句子是通过从视频中提取的关键元素(例如狗、人类、雪、玩耍)由GPT生成,来描述视频内容的,所以它们本身很多意思都很接近。我们打算利用这个案例研究来展示我们对进一步深入理解运动描述中的微妙区别的兴趣。 正常的视频检索,看看试试我们模型在主流基准上的测试或者试一些正常的例子可以感受下它的效果。 ---- The provided example poses a considerable challenge for video understanding models due to its high level of complexity. These sentences have been generated by GPT, utilizing key...
ViCLIP utilizes both vision and text transformers that are initialized from CLIP's transformers. Regarding the statement, "If vision transforms are not pre-trained, such as MAE method, then it means that...
Indeed, your idea seems to be reasonable. However, during the implementation process, it is crucial to ensure that the hyperparameters for training are appropriately tuned. Additionally, we suggest initializing the...