MovieChat icon indicating copy to clipboard operation
MovieChat copied to clipboard

How can I get MovieChat-1K dataset?

Open Aurorana opened this issue 1 year ago • 42 comments

You guys did a great job, I would like to use your dataset to test other models, how can I get MovieChat-1K dataset?

Aurorana avatar Jan 09 '24 07:01 Aurorana

We will release the train and val set of MovieChat-1K next week.

Espere-1119-Song avatar Jan 09 '24 14:01 Espere-1119-Song

We will release the train and val set of MovieChat-1K next week.

Hello, how can I download the train and val set?

typ1012 avatar Jan 17 '24 06:01 typ1012

Currently, feature extraction is nearly completed. However, downloading the dataset locally and uploading it to Hugging Face will still require some time, which exceeds our initial estimation. We are working diligently to expedite the data release and anticipate confirming the release by February.

Espere-1119-Song avatar Jan 17 '24 06:01 Espere-1119-Song

Hello team, Did you release the dataset, we are in February?

KerolosAtef avatar Feb 11 '24 19:02 KerolosAtef

Sorry, we are still solving the problem of data upload. For some reasons, we cannot upload data directly to huggingface. I will upload a few to huggingface today (nearly 900 in total).

Espere-1119-Song avatar Feb 12 '24 07:02 Espere-1119-Song

Okay, thanks for your help, I'm waiting.

KerolosAtef avatar Feb 12 '24 10:02 KerolosAtef

I have uploaded the CLIP feature of one video to huggingface. Each video is extracted with 8192 frames, and each hdf5 file stores 64 frames.

Espere-1119-Song avatar Feb 12 '24 11:02 Espere-1119-Song

Hi @Espere-1119-Song,

Thank you for this awesome work. Can you please upload the frames for all the videos that you utilized in your evaluation? If you can't upload the frames of the videos, can you please update all the CLIP features for every video? I would really appreciate this as we are on a time crunch and need to evaluate on your benchmark as soon as possible. I really need all the videos in the eval dataset so that I have a proper evaluation of your benchmark. I am looking forward to hearing from you!

Thank you!! :)

essamsleiman avatar Feb 12 '24 13:02 essamsleiman

We will release the CLIP features for every video, containing 8192 frames of each video. Since we are holding a workshop at CVPR 2024, we will release the test set later for a fair comparison.

We are hurrying to upload the training set to HuggingFace recently. You can just use the training set in your paper for now. We found that the performance is similar if you are using the zero-shot setting.

Espere-1119-Song avatar Feb 13 '24 02:02 Espere-1119-Song

Hi @Espere-1119-Song thanks for getting back to me! The train set is fine for our evaluation, when will it be released? I can't find it here: https://huggingface.co/datasets/Enxin/MovieChat-1K_train

Thanks once again :)

essamsleiman avatar Feb 13 '24 02:02 essamsleiman

You can find an example in ''Files'' and use this example to debug first. Eval set includes 100 videos. Although we cannot guarantee that we will upload the entire train set quickly, we can upload 100 videos within this week.

image

Espere-1119-Song avatar Feb 13 '24 02:02 Espere-1119-Song

Thank you @Espere-1119-Song! 100 train set videos works for us. I am looking forward to the 100 videos within this week!

Best, Essam

essamsleiman avatar Feb 13 '24 02:02 essamsleiman

@essamsleiman , @KerolosAtef Over 100 videos and respective annotations have been uploaded to Huggingface, which can be used for evaluation.

Espere-1119-Song avatar Feb 17 '24 05:02 Espere-1119-Song

Thank you very much @Espere-1119-Song , just for verification : these are the videos which used for Table 3 in the paper (MovieChat-1K test set)

KerolosAtef avatar Feb 17 '24 11:02 KerolosAtef

Since we are holding a workshop at CVPR 2024, we will release the test set later for a fair comparison,which is used for Table 3 in the paper. You can just use the training set in your paper for now. We found that the performance is similar if you are using the zero-shot setting.

Espere-1119-Song avatar Feb 17 '24 14:02 Espere-1119-Song

Okay, thank you so much for your help

KerolosAtef avatar Feb 17 '24 17:02 KerolosAtef

@Espere-1119-Song How can I clone the dataset over SSH? When I try to use the instructions to clone the repository using git-lfs it seems to get stuck after downloading the json folder and it doesn't download the movie tar files.

zpx01 avatar Feb 23 '24 21:02 zpx01

Because the tar files is too big, it is normal to get stack when download the movie tar files

Espere-1119-Song avatar Feb 24 '24 04:02 Espere-1119-Song

@Espere-1119-Song When can we expect the full training set to be available on HuggingFace?

zpx01 avatar Feb 26 '24 04:02 zpx01

Currently, all annotation files have been uploaded. The feature extraction files of training videos are gradually uploading, and it is expected that all will be uploaded around mid March.

Espere-1119-Song avatar Feb 26 '24 04:02 Espere-1119-Song

@Espere-1119-Song thanks for the quick response! I was also wondering if you could point me to the code in the repository that can be used to get started with loading/using the extracted CLIP features?

zpx01 avatar Feb 26 '24 05:02 zpx01

Sorry, there is no code for directly loading/using the extracted features in the current repo. You can use the dataset without loading vit_model in https://github.com/rese1f/MovieChat/blob/main/MovieChat/models/moviechat.py. Note that we extract 8192 frames for each video. ​

Espere-1119-Song avatar Feb 26 '24 05:02 Espere-1119-Song

Thank you for your exceptional contribution. Could I inquire whether it would be feasible for you to provide access to the unedited video or the frames?

patrick-tssn avatar Mar 04 '24 06:03 patrick-tssn

Sorry, the unedited video and the frames will be used for holding a workshop at CVPR 2024, we will release the test set later for a fair comparison.

Espere-1119-Song avatar Mar 04 '24 06:03 Espere-1119-Song

Sorry, the unedited video and the frames will be used for holding a workshop at CVPR 2024, we will release the test set later for a fair comparison.

Thank you for your response. Could you please provide a more precise timeline for the release of your dataset?

patrick-tssn avatar Mar 07 '24 02:03 patrick-tssn

We have released both the Train Set and the Test Set of MovieChat-1K.

Espere-1119-Song avatar Mar 14 '24 08:03 Espere-1119-Song

@Espere-1119-Song Is the train set still only the CLIP features? Is there any plan to release the full videos?

zpx01 avatar Mar 14 '24 09:03 zpx01

Due to copyright restrictions, we share the clip features extracted by eva_vit_g, containing 8192 frames of each video.

Espere-1119-Song avatar Mar 14 '24 09:03 Espere-1119-Song

Due to copyright restrictions, we share the clip features extracted by eva_vit_g, containing 8192 frames of each video.

Does this mean that the complete videos or the extracted frames will never be released at any point in the future?

patrick-tssn avatar Mar 14 '24 09:03 patrick-tssn

we release complete videos in the test set. For the train set, we only release the extracted frame feature and may not release the raw videos in the future.

Espere-1119-Song avatar Mar 14 '24 09:03 Espere-1119-Song