img2dataset icon indicating copy to clipboard operation
img2dataset copied to clipboard

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Results 164 img2dataset issues
Sort by recently updated
recently updated
newest added

It was recently noticed that [laion 400m](https://laion.ai/laion-400-open-dataset/) only contains urls from 5M domains. The same is probably true for other datasets. Pre-resolving the domains would decrease the charge on the...

Useful to demonstrate the usefulness of this kind of dataset

documentation

Hello, first I would like to congrats you for the amazing work on this lib. **Issue** I'm trying to download the LAION5B using img2dataset on a EMR Cluster using these...

For Vision and Language pretraining cc3m, mscoco, SBUcaptions and VG are very relevant datasets. I haven't been able to download SBU captions and VG. Here are my questions. 1) How...

thanks fot the great work!I just wonder how can i get flickr30k dataset use this?

Bumps [mypy](https://github.com/python/mypy) from 1.8.0 to 1.9.0. Changelog Sourced from mypy's changelog. Mypy Release Notes Mypy 1.9 We’ve just uploaded mypy 1.9 to the Python Package Index (PyPI). Mypy is a...

dependencies

Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.0.0 to 8.0.2. Release notes Sourced from pytest's releases. 8.0.2 pytest 8.0.2 (2024-02-24) Bug Fixes #11895: Fix collection on Windows where initial paths contain the short version...

dependencies

Bumps [pylint](https://github.com/pylint-dev/pylint) from 3.0.3 to 3.1.0. Commits 053c2c3 Bump pylint to 3.1.0, update changelog c954636 Upgrade release documentation, and contributors.txt 7300ed2 Discover .pyi files (#9241) 9dbf3df Merge maintenance 3.0.x into...

dependencies

I have been download meta-data. Is there any problems else? ![image](https://github.com/rom1504/img2dataset/assets/57973825/b148bdad-c922-49e2-a243-6af1ff1a4784) this is my code: img2dataset --url_list laion400m-meta --input_format "parquet" --url_col "URL" --caption_col "TEXT" --output_format webdataset --output_folder laion400m-data --processes_count 16...