img2dataset
img2dataset copied to clipboard
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
It was recently noticed that [laion 400m](https://laion.ai/laion-400-open-dataset/) only contains urls from 5M domains. The same is probably true for other datasets. Pre-resolving the domains would decrease the charge on the...
Useful to demonstrate the usefulness of this kind of dataset
Hello, first I would like to congrats you for the amazing work on this lib. **Issue** I'm trying to download the LAION5B using img2dataset on a EMR Cluster using these...
For Vision and Language pretraining cc3m, mscoco, SBUcaptions and VG are very relevant datasets. I haven't been able to download SBU captions and VG. Here are my questions. 1) How...
thanks fot the great work!I just wonder how can i get flickr30k dataset use this?
Bumps [mypy](https://github.com/python/mypy) from 1.8.0 to 1.9.0. Changelog Sourced from mypy's changelog. Mypy Release Notes Mypy 1.9 We’ve just uploaded mypy 1.9 to the Python Package Index (PyPI). Mypy is a...
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.0.0 to 8.0.2. Release notes Sourced from pytest's releases. 8.0.2 pytest 8.0.2 (2024-02-24) Bug Fixes #11895: Fix collection on Windows where initial paths contain the short version...
Bumps [pylint](https://github.com/pylint-dev/pylint) from 3.0.3 to 3.1.0. Commits 053c2c3 Bump pylint to 3.1.0, update changelog c954636 Upgrade release documentation, and contributors.txt 7300ed2 Discover .pyi files (#9241) 9dbf3df Merge maintenance 3.0.x into...
I have been download meta-data. Is there any problems else?  this is my code: img2dataset --url_list laion400m-meta --input_format "parquet" --url_col "URL" --caption_col "TEXT" --output_format webdataset --output_folder laion400m-data --processes_count 16...