Shubhashis Roy Dipta comments

Results 21 comments of


                                            Shubhashis Roy Dipta

Translation: Bangla

I also want to help in that aspect.

multiple positive sampels per query

You can look into this paper: [Improving Event Representation via Simultaneous Weakly Supervised Contrastive Learning and Clustering](https://arxiv.org/abs/2203.07633) They have used multiple positives, in short, what they have done is to...

code for zero-shot video QA

@LiJunnan1992 any update here? or any suggestion on how to use BLIP for video qa?

code on text-video qa

> We use the VQA model to generation answers: > > https://github.com/salesforce/BLIP/blob/48211a1594f1321b00f14c9f7a5b4813144b2fb9/models/blip_vqa.py#L85 > > To handle videos, we simply concatenate frame features and pass them to the text decoder. @LiJunnan1992...

TimeoutError: Navigation Timeout Exceeded: 30000ms exceed

Same issue

[Feature]: Support HuggingFaceM4/idefics2-8b as vision model

Any update here? Thanks.

Unsloth currently does not support multi GPU setups in unsloth-2024.8

> Any updates? Do you have a roadmap for multi-GPU support? any update here?

Request for InternVideo2-stage2 6b model

Thanks @leexinhao , any idea when the script or config will be available for [MSRVTT](https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/MODEL_ZOO.md#zero-shot-video-text-retrieval)?

Request for InternVideo2-stage2 6b model

Also, the weight gives error: ``` import torch state_dict = torch.load("data/models/InternVid2/internvideo2-s2_6b-224p-f4.pt", map_location='cpu') ``` ## Error: ``` :1: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses...

Request for InternVideo2-stage2 6b model

> You should ref to https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/scripts/evaluation/stage2/zero_shot/1B/eval_msrvtt.sh to test it. I will try but should the config be totally same as 6b? 🤔🤔