Shubhashis Roy Dipta

Results 21 comments of Shubhashis Roy Dipta

I also want to help in that aspect.

You can look into this paper: [Improving Event Representation via Simultaneous Weakly Supervised Contrastive Learning and Clustering](https://arxiv.org/abs/2203.07633) They have used multiple positives, in short, what they have done is to...

@LiJunnan1992 any update here? or any suggestion on how to use BLIP for video qa?

> We use the VQA model to generation answers: > > https://github.com/salesforce/BLIP/blob/48211a1594f1321b00f14c9f7a5b4813144b2fb9/models/blip_vqa.py#L85 > > To handle videos, we simply concatenate frame features and pass them to the text decoder. @LiJunnan1992...

> Any updates? Do you have a roadmap for multi-GPU support? any update here?

Thanks @leexinhao , any idea when the script or config will be available for [MSRVTT](https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/MODEL_ZOO.md#zero-shot-video-text-retrieval)?

Also, the weight gives error: ``` import torch state_dict = torch.load("data/models/InternVid2/internvideo2-s2_6b-224p-f4.pt", map_location='cpu') ``` ## Error: ``` :1: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses...

> You should ref to https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/scripts/evaluation/stage2/zero_shot/1B/eval_msrvtt.sh to test it. I will try but should the config be totally same as 6b? 🤔🤔