img2dataset
img2dataset copied to clipboard
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Some datasets store, along with the image (and optional caption/label), an md5 hash of the image. For example, FaceScrub and PubFig do this. While recent datasets don't store hashes (and...
I have created a .txt file with the URLs taken from the cheescake class on the LAION-5B database. I run the command img2dataset --url_list=subset.txt --output_folder=out and it produces the following...
https://github.com/rom1504/img2dataset/blob/main/img2dataset/reader.py#L72 A way might be to use something really rare as separator, or to tell pyarrow there is a single column and separators shouldn't be searched for.
Provide a Slurm batch strategy for img2dataset distributed To do: A Find a way to install the resolver automatically on instances : without root or docker B Create a distribution...
Being able to crop images centred on desired, detected features in the form of keypoints could be desirable. For example: If the image likely contains a human face, I want...
* 100k representative images (10 shards) * A table in .md with current numbers Then use it to benchmark several solutions. For example: * Albumentation batching support
webp and gif the work to do that is mostly in the resizer
Hi. Thanks for the amazing repository. It really makes the workflow very easy. I was wondering if you are considering to add video datasets as well. Some are based on...