about source video of videochat1, videochat2, videochatgpt
I am planning to download videochat2 related video dataset, there are some confusing points.
As mentioned in DATA.md, the videochat1, videochat2, videochatgpt are based Intervideo.
The train.json form videochat1/videochat2/videochatgpt, video are annotated like: 000001_000050/1066682446.mp4, but all Intervideo dataset all video only with a YouTube_id_str.
how to build the mapping?
besides, is there a way to download only videochat1/videochat2/videochatgpt related videos ?
The videos of videochat1 in the instruction dataset were re-annotated from WebVid, and you can go to WebVid to obtain the corresponding download links. If you have downloaded the original annotation files for webvid, the CSV should contain the download URLs for each video. You can use tools like wget or request to download them, and each folder should have the respective videos. The name of the video is composed of page_dir and videoid.
For Video Conversation data in VideoChat2, it is based on InternVid, which you can download from InternVid. Another way is to download it based on the video's name. The first 11 characters of the video name are the YoutubeID, and the content after the underscore _ is the starting time of the segment, with a duration of 10 seconds.
For the data used in VideoChatGPT, you can directly access them through the shared link or obtain them based on the ID from ActivityNet.
thanks for your kindly reply.
For the data used in VideoChatGPT, you can directly access them through the shared link or obtain them based on the ID from ActivityNet.
This link seems invalid...
For the data used in VideoChatGPT, you can directly access them through the shared link or obtain them based on the ID from ActivityNet.
This link seems invalid...
I can open this link normally. Maybe you can find this dataset in VideoChatGPT github Repo
https://mbzuaiac-my.sharepoint.com/personal/hanoona_bangalath_mbzuai_ac_ae/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fhanoona%5Fbangalath%5Fmbzuai%5Fac%5Fae%2FDocuments%2FVideo%2DChatGPT%2FData%5FCode%5FModel%5FRelease%2FData%2Ftraining%5Fvideos&ga=1
How about this share link? https://mbzuaiac-my.sharepoint.com/:f:/g/personal/hanoona_bangalath_mbzuai_ac_ae/EnLRDehrr8lGqHpC5w1zZ9QBnsiVffYy5vCv8Hl14deRcg?e=Ul5DUE
How about this share link? https://mbzuaiac-my.sharepoint.com/:f:/g/personal/hanoona_bangalath_mbzuai_ac_ae/EnLRDehrr8lGqHpC5w1zZ9QBnsiVffYy5vCv8Hl14deRcg?e=Ul5DUE
this link could open correctly. thanks a lot.
Also, I was wondering if you can share videochat2 videos in a similar manner? Many videos are not downloading properly.
How about this share link? https://mbzuaiac-my.sharepoint.com/:f:/g/personal/hanoona_bangalath_mbzuai_ac_ae/EnLRDehrr8lGqHpC5w1zZ9QBnsiVffYy5vCv8Hl14deRcg?e=Ul5DUE
this link could open correctly. thanks a lot.
Also, I was wondering if you can share videochat2 videos in a similar manner? Many videos are not downloading properly.
I'm sorry, we do not have the copyright for these videos and cannot directly share them.
Sorry to bother you again. the video_id (in caption/videochat/train.json, like 000001_000050/1066682446.mp4), but the video from Intervideo all video are named YouTubID, how to build this mapping?
besides, the vChat QR code is expired, please update this code.
The videos of videochat1 in the instruction dataset were re-annotated from WebVid, and you can go to WebVid to obtain the corresponding download links. If you have downloaded the original annotation files for webvid, the CSV should contain the download URLs for each video. You can use tools like wget or request to download them, and each folder should have the respective videos. The name of the video is composed of
page_dirandvideoid.
@peiliu0408 The instruction data of videochat1 is sourced from WebVid, you can refer to here to obtain the data. Thank you for your reminder. We will update the group QR code as soon as possible. Before that, you can scan the WeChat QR code of "GV小助手" to let her add you to the vChat wechat group.
The videos of videochat1 in the instruction dataset were re-annotated from WebVid, and you can go to WebVid to obtain the corresponding download links. If you have downloaded the original annotation files for webvid, the CSV should contain the download URLs for each video. You can use tools like wget or request to download them, and each folder should have the respective videos. The name of the video is composed of
page_dirandvideoid.@peiliu0408 The instruction data of videochat1 is sourced from WebVid, you can refer to here to obtain the data. Thank you for your reminder. We will update the group QR code as soon as possible. Before that, you can scan the WeChat QR code of "GV小助手" to let her add you to the vChat wechat group.
the task: caption/videochat/train.json also sampled form Webvid dataset? but
For videochat1, it was sampled from WebVid. For videochat2, it was sampled from InternVid. The links in the screenshots point to the InternVideo-data link.
In detail, video_caption/videochat/train.json is caption data sampled from WebVid. The video_conversation/videochat2 is sampled from the instruction dialogue data of InternVid while video_conversation/videochat1 is still sampled from the instruction dialogue data of WebVid.
Thank you for pointing that out, we will update the corresponding pages ASAP.
For the data used in VideoChatGPT, you can directly access them through the shared link or obtain them based on the ID from ActivityNet.
This link seems invalid...
I can open this link normally. Maybe you can find this dataset in VideoChatGPT github Repo
https://mbzuaiac-my.sharepoint.com/personal/hanoona_bangalath_mbzuai_ac_ae/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fhanoona%5Fbangalath%5Fmbzuai%5Fac%5Fae%2FDocuments%2FVideo%2DChatGPT%2FData%5FCode%5FModel%5FRelease%2FData%2Ftraining%5Fvideos&ga=1
Hi, the videos provided in this folder seem incomplete (missing ~700 videos). Also, I wonder why the json file for videochatgpt contains video paths like "train/xxx.mp4" and "val/yyy.mp4", while I cannot see this in the files downloaded from the sharepoint link. Where does the "val/" come from? There are two additional zip files in the sharepoint folder "ActivityNet_Train_Videos_Video-ChatGPT.zip" and "ActivityNet_Test-1-3_videos.zip", should I download them as well? Thanks in advance.
edit: it seems that "ActivityNet_Train_Videos_Video-ChatGPT.zip" contains some missing videos but there are still ~160 videos that are missing. Can you share the way you obtained the video data for videochatgpt? Thanks
We directly use the previously obtained ActivityNet dataset. If you have downloaded this dataset before, you can use it directly. For the missing videos, you can ask the author of VideoChatGPT :)
For the data used in VideoChatGPT, you can directly access them through the shared link or obtain them based on the ID from ActivityNet.
This link seems invalid...
I can open this link normally. Maybe you can find this dataset in VideoChatGPT github Repo https://mbzuaiac-my.sharepoint.com/personal/hanoona_bangalath_mbzuai_ac_ae/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fhanoona%5Fbangalath%5Fmbzuai%5Fac%5Fae%2FDocuments%2FVideo%2DChatGPT%2FData%5FCode%5FModel%5FRelease%2FData%2Ftraining%5Fvideos&ga=1
Hi, the videos provided in this folder seem incomplete (missing ~700 videos). Also, I wonder why the json file for videochatgpt contains video paths like "train/xxx.mp4" and "val/yyy.mp4", while I cannot see this in the files downloaded from the sharepoint link. Where does the "val/" come from? There are two additional zip files in the sharepoint folder "ActivityNet_Train_Videos_Video-ChatGPT.zip" and "ActivityNet_Test-1-3_videos.zip", should I download them as well? Thanks in advance.
edit: it seems that "ActivityNet_Train_Videos_Video-ChatGPT.zip" contains some missing videos but there are still ~160 videos that are missing. Can you share the way you obtained the video data for videochatgpt? Thanks
Hi, I have the same question. how are the train/ and val/ divisions in the VideoChatGPT dataset obtained? Do they correspond to ActivityNet_Train_Videos_Video-ChatGPT.zip and ActivityNet_Test-1-3_videos.zip respectively? Have you solved it? Thanks for suggestion!
