Yi Wang comments

Results 72 comments of


                                            Yi Wang

About the time to release pretrained weights

Sorry for the late reply. Need time for CVPR submission.

Any missing classes/functions in viclip_text.py and viclip_vision.py?

viclip_text.py has been updated, while the viclip_vision.py file appears to be functioning properly. We encourage you to test them and report any further issues you encounter.

The issue of Temporal-Action-Localization

That is a module to load videos on our servers. It may not be applicable in your case. You can remove it and update the corresponding video loading functions. We...

how can we support streaming video?

Good idea. We are working on it. Hope to add this feature soon.

Can you upload the user guide in detail?

We will update it later. For [inpaint-attnorm](https://github.com/shepnerd/AttenNorm/tree/master/inpaint-attnorm), its usage can be partially referred to [here](https://github.com/shepnerd/inpainting_gmcnn).

missing normalization affine parameters

For the inpainting application, we do not use the affine parameters for denormalization. The final inpainting effects in the paper are without affine parameters.

Are the 6k action words available?

We require an internal discussion to make a decision. We will provide you with an update shortly.

S2 pretrained model of InternVideo2 does not work well for Zero-Shot Video-Text Retrieval

提供的示例是个极端难的样例。这些句子是通过从视频中提取的关键元素（例如狗、人类、雪、玩耍）由GPT生成，来描述视频内容的，所以它们本身很多意思都很接近。我们打算利用这个案例研究来展示我们对进一步深入理解运动描述中的微妙区别的兴趣。正常的视频检索，看看试试我们模型在主流基准上的测试或者试一些正常的例子可以感受下它的效果。 ---- The provided example poses a considerable challenge for video understanding models due to its high level of complexity. These sentences have been generated by GPT, utilizing key...

About Framework of ViCLIP

ViCLIP utilizes both vision and text transformers that are initialized from CLIP's transformers. Regarding the statement, "If vision transforms are not pre-trained, such as MAE method, then it means that...

About Framework of ViCLIP

Indeed, your idea seems to be reasonable. However, during the implementation process, it is crucial to ensure that the hyperparameters for training are appropriately tuned. Additionally, we suggest initializing the...