CLIP4Clip issues

MSVD Weights

1

Hello, thanks for providing the code! In regards to the models used for generating the results in the paper, is this uploaded anywhere that can be shared? I would be...

ntseng450

how long have you been training

1

how long have you been training

flyinghpluo

Evaluation Procedure for Reporting Performance

In your paper, you conducted experiments for 5 epochs. In reference to this issue (https://github.com/ArrowLuo/CLIP4Clip/issues/36), it is mentioned that you reported performance based on the best scores on the validation...

willyfh

post-pretrain or downstream tasks parameters

Can you provide the parameters after using how-to-100 for post-pretraining, or can you provide the parameters trained for downstream tasks? Due to limited computing resources, I would like to obtain...

1240446371

accelerate sim_matrix process in multi-GPU

1

I edit two main things: 1. Deleting the "loss.mean()" that do nothing. DDP provides automatically gradient synchronization. 2. Refer to this comment, https://github.com/openai/CLIP/issues/132#issuecomment-908004353 we will do every similarity calculation locally....

zsnoob

Question about the calculation method of loss when there are multiple gpus

1

https://github.com/ArrowLuo/CLIP4Clip/blob/508ffa3de39ba0563a03199c440ab602a72e9b6f/modules/modeling.py#L400 ``` if self.training: visual_output = allgather(visual_output, self.task_config) video_mask = allgather(video_mask, self.task_config) sequence_output = allgather(sequence_output, self.task_config) torch.distributed.barrier() visual_output = visual_output / visual_output.norm(dim=-1, keepdim=True) visual_output = self._mean_pooling_for_similarity_visual(visual_output, video_mask) visual_output = visual_output...

yinaoxiong

What do Pair, L, T stand for in the code?

3

Hi, I'm a beginner and would like to ask a question. What do Pair, L, T stand for in the code? What do they mean? ``` # Pair x L...

nhw649

Problem about Multi-GPU-Train architecture

In main_task_retrieval.py, fuction "train_epoch", we can see: ``` python if n_gpu > 1: loss = loss.mean() # mean() to average on multi-gpu. ``` But in modeling.py, there is: ``` python...

zsnoob

Reproduction on LSMDC DataSet

6

Sorry to disturb you. When I reproduce the results on LSMDC dataset, I get worse results than those in paper. In the meanP experiment, the meanR is always around 200,...

JingXiaolun

Video-to-video retrieval?

Hi authors! Thanks for the great work! I saw that is paper is evaluated on all kinds of video-to-text dataset. CLIP model itself works pretty well for image-to-image retrieval, despite...

wzlxjtu

CLIP4Clip
CLIP4Clip copied to clipboard

Metadata

MSVD Weights

how long have you been training

Evaluation Procedure for Reporting Performance

post-pretrain or downstream tasks parameters

accelerate sim_matrix process in multi-GPU

Question about the calculation method of loss when there are multiple gpus

What do Pair, L, T stand for in the code?

Problem about Multi-GPU-Train architecture

Reproduction on LSMDC DataSet

Video-to-video retrieval?

← Metadata

Owner

Metadata

CLIP4Clip CLIP4Clip copied to clipboard

Metadata

← Metadata

Owner

Metadata

CLIP4Clip
CLIP4Clip copied to clipboard