Panda-70M icon indicating copy to clipboard operation
Panda-70M copied to clipboard

The error occurs when downloading with the argument "--clip_col='timestamp'".

Open peiliu0408 opened this issue 1 year ago • 5 comments

Before downloading the dataset, an experiment was conducted using 200 videos. While these 200 videos could be downloaded (I got 193 videos finally) correctly without the argument "--clip_col="timestamp".

But an error occurred when this argument was included. In the subsequent experiment, none of the videos were downloaded correctly.

截屏2024-03-07 16 51 31 截屏2024-03-07 16 52 05

I am new to this tool, so I am unsure if there might be any formatting errors in the timestamp.

peiliu0408 avatar Mar 07 '24 08:03 peiliu0408

It seems to me you are using the upstream video2dataset tool instead of the one included in this repository, which does not work with the provided .csv files that includes the time stamps as Python arrays which are not parsed properly by the normal video2dataset, and gets passed as a string in clip_spans.

If you want to use the standard video2dataset tool, you might want to convert the csv files to proper JSON. And it would be nice if this repo documented the changes to "vendored" copy of video2dataset and made it clear that the normal command does not work. Maybe also consider publishing the metadata in better format than csv.

pabl0 avatar Mar 07 '24 16:03 pabl0

@peiliu0408 @pabl0 Yes, it seems you are using video2dataset for downloading, but it cannot work on Panda70M csv files. Please check here for the reason that we need to modify video2dataset tool. And please try to use the video2dataset in this repo to download the dataset.

tsaishien-chen avatar Mar 07 '24 19:03 tsaishien-chen

It seems to me you are using the upstream video2dataset tool instead of the one included in this repository, which does not work with the provided .csv files that includes the time stamps as Python arrays which are not parsed properly by the normal video2dataset, and gets passed as a string in clip_spans.

If you want to use the standard video2dataset tool, you might want to convert the csv files to proper JSON. And it would be nice if this repo documented the changes to "vendored" copy of video2dataset and made it clear that the normal command does not work. Maybe also consider publishing the metadata in better format than csv.

Thanks for your reply! Just to confirm, I installed the official video2dataset version and now I'd like to reinstall the tool from this repository

peiliu0408 avatar Mar 07 '24 23:03 peiliu0408

Yes, please try to uninstall the original video2dataset and reinstall our version!

tsaishien-chen avatar Mar 07 '24 23:03 tsaishien-chen

Yes, please try to uninstall the original video2dataset and reinstall our version!

I have some questions regarding how to set the optimal processes_count and thread_count in the config file for a machine with 8 cores and 16GB of memory. Can you help me with this?

peiliu0408 avatar Mar 08 '24 03:03 peiliu0408