DALLE-datasets
DALLE-datasets copied to clipboard
add a table with dataset sizes
having a table with dataset and some information about size/ time to download would be useful https://docs.google.com/document/d/1KCAB-OTHphcCh-4oITIL8r7ih-HuslMKX1Rls_P03CY/edit could serve as complementary information
I will add information here as I download things. Starting with CC3M, I intend to download it then produce some clip embeddings (using https://github.com/rom1504/clip-retrieval/) / list of clip filtered files
Once it's clear enough, will PR to readme
I downloaded cc3m and cc12m (improving their script a bit in the process)
- cc3m took 20h and resulted in 100GB of resized images, 5.6M of them, in size on dimension 320 the other larger
- cc12m took 10h and resulted in 300GB of resized images, 10.7M of them, in size 256
cc3m can obviously take way less time if using the improved script of cc12m I confirmed in the process that handling million of files is painful and will make it possible to download directly as collection of tars (== webdataset format)
@rom1504 the doc is not available now.
i want to download the data, can you please help me.
I just find download_open_images.txt file in the repo. how to download using text file ?