MovieChat
MovieChat copied to clipboard
How can I get MovieChat-1K dataset?
You guys did a great job, I would like to use your dataset to test other models, how can I get MovieChat-1K dataset?
We will release the train and val set of MovieChat-1K next week.
We will release the train and val set of MovieChat-1K next week.
Hello, how can I download the train and val set?
Currently, feature extraction is nearly completed. However, downloading the dataset locally and uploading it to Hugging Face will still require some time, which exceeds our initial estimation. We are working diligently to expedite the data release and anticipate confirming the release by February.
Hello team, Did you release the dataset, we are in February?
Sorry, we are still solving the problem of data upload. For some reasons, we cannot upload data directly to huggingface. I will upload a few to huggingface today (nearly 900 in total).
Okay, thanks for your help, I'm waiting.
I have uploaded the CLIP feature of one video to huggingface. Each video is extracted with 8192 frames, and each hdf5 file stores 64 frames.
Hi @Espere-1119-Song,
Thank you for this awesome work. Can you please upload the frames for all the videos that you utilized in your evaluation? If you can't upload the frames of the videos, can you please update all the CLIP features for every video? I would really appreciate this as we are on a time crunch and need to evaluate on your benchmark as soon as possible. I really need all the videos in the eval dataset so that I have a proper evaluation of your benchmark. I am looking forward to hearing from you!
Thank you!! :)
We will release the CLIP features for every video, containing 8192 frames of each video. Since we are holding a workshop at CVPR 2024, we will release the test set later for a fair comparison.
We are hurrying to upload the training set to HuggingFace recently. You can just use the training set in your paper for now. We found that the performance is similar if you are using the zero-shot setting.
Hi @Espere-1119-Song thanks for getting back to me! The train set is fine for our evaluation, when will it be released? I can't find it here: https://huggingface.co/datasets/Enxin/MovieChat-1K_train
Thanks once again :)
You can find an example in ''Files'' and use this example to debug first. Eval set includes 100 videos. Although we cannot guarantee that we will upload the entire train set quickly, we can upload 100 videos within this week.
Thank you @Espere-1119-Song! 100 train set videos works for us. I am looking forward to the 100 videos within this week!
Best, Essam
@essamsleiman , @KerolosAtef Over 100 videos and respective annotations have been uploaded to Huggingface, which can be used for evaluation.
Thank you very much @Espere-1119-Song , just for verification : these are the videos which used for Table 3 in the paper (MovieChat-1K test set)
Since we are holding a workshop at CVPR 2024, we will release the test set later for a fair comparison,which is used for Table 3 in the paper. You can just use the training set in your paper for now. We found that the performance is similar if you are using the zero-shot setting.
Okay, thank you so much for your help
@Espere-1119-Song How can I clone the dataset over SSH? When I try to use the instructions to clone the repository using git-lfs it seems to get stuck after downloading the json folder and it doesn't download the movie tar files.
Because the tar files is too big, it is normal to get stack when download the movie tar files
@Espere-1119-Song When can we expect the full training set to be available on HuggingFace?
Currently, all annotation files have been uploaded. The feature extraction files of training videos are gradually uploading, and it is expected that all will be uploaded around mid March.
@Espere-1119-Song thanks for the quick response! I was also wondering if you could point me to the code in the repository that can be used to get started with loading/using the extracted CLIP features?
Sorry, there is no code for directly loading/using the extracted features in the current repo. You can use the dataset without loading vit_model
in https://github.com/rese1f/MovieChat/blob/main/MovieChat/models/moviechat.py. Note that we extract 8192 frames for each video.
Thank you for your exceptional contribution. Could I inquire whether it would be feasible for you to provide access to the unedited video or the frames?
Sorry, the unedited video and the frames will be used for holding a workshop at CVPR 2024, we will release the test set later for a fair comparison.
Sorry, the unedited video and the frames will be used for holding a workshop at CVPR 2024, we will release the test set later for a fair comparison.
Thank you for your response. Could you please provide a more precise timeline for the release of your dataset?
@Espere-1119-Song Is the train set still only the CLIP features? Is there any plan to release the full videos?
Due to copyright restrictions, we share the clip features extracted by eva_vit_g, containing 8192 frames of each video.
Due to copyright restrictions, we share the clip features extracted by eva_vit_g, containing 8192 frames of each video.
Does this mean that the complete videos or the extracted frames will never be released at any point in the future?
we release complete videos in the test set. For the train set, we only release the extracted frame feature and may not release the raw videos in the future.