img2dataset
img2dataset copied to clipboard
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Hi, when I download the LAION2B-multi dataset using Spark, it will stop at some point, but there is no error. Then I rerun the code by setting `incremental_mode="incremental"` and get...
Hi @rom1504, for dataset like the Laion-400m or 5B, it is very hard to download them to a single disk when the disk space is limited. What I did was...
Hi, Thanks for your awesome work. I am using AWS EMR (with spark) to download the LAION5B dataset by following this [distributed mode](https://github.com/rom1504/img2dataset/blob/main/dataset_examples/laion5B.md). However, when I run the download.py in...
https://redcaps.xyz/download
Useful to use img2dataset for inference directly without saving to disk
So people could run it on their output to check everything is as expected some things can be integrated directly as metrics in img2dataset, but maybe not everything
for example * pure ssh * dask cluster ? * ray cluster ? follow up of https://github.com/rom1504/img2dataset/issues/20
current tfrecord format is 30% slower then webdataset when writing to a non local filesystem it also supports only a limited subset of file systems Let's figure out a way...