GroupViT icon indicating copy to clipboard operation
GroupViT copied to clipboard

Mistakes in the GCC Dataset download commands

Open PardoAlejo opened this issue 3 years ago • 1 comments

Hi! I realized that the commands for downloading GCC 3M and 12M have a couple of typos. The corrected version for the 12M is below:

sed -i '1s/^/url\tcaption\n/' gcc12m.tsv
img2dataset --url_list gcc12m.tsv --input_format "tsv" \
            --url_col "url" --caption_col "caption" --output_format webdataset\
            --output_folder local_data/gcc12m_shards \
            --processes_count 16 --thread_count 64 \
            --image_size 512 --resize_mode keep_ratio --resize_only_if_bigger True \
            --enable_wandb True --save_metadata False --oom_shard_count 6

It would be nice if you can update the README.me with it. Great work! Thank for sharing it.

PardoAlejo avatar Mar 30 '22 17:03 PardoAlejo

Thx! Will do!

xvjiarui avatar Mar 30 '22 23:03 xvjiarui