img2dataset icon indicating copy to clipboard operation
img2dataset copied to clipboard

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Results 164 img2dataset issues
Sort by recently updated
recently updated
newest added

This is an attempt to fix #332 in a simple manner (not using anything fancy like urllib3.Retry). I think it should improve d/l performance significantly on datasets with large amounts...

Is there any way to use a proxy when downloading images? Sometimes some servers can't access the address of the image directly due to network reasons, and intermediate proxies are...

Scripts and softwares for automated scrapping must follow robots.txt rules, otherwise it may make the user liable for unauthorised use of data.

enhancement

See https://noml.info/ ```diff diff --git a/README.md b/README.md index 12fd5e6..f9b65d1 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ For better performance, it's highly recommended to set up a fast dns...

When I tried to download coyo-700m with img2dataset, I got an error: "pyarrow.lib.ArrowInvalid: Could not open Parquet input source '': Parquet magic bytes not found in footer. Either the file...

https://github.com/rom1504/img2dataset/tree/streaming_refacto some work I started on that some 8 months ago I still think it's the right direction ![Screenshot_20230820_233013](https://github.com/rom1504/img2dataset/assets/2346494/e3433301-be3e-4321-9848-8bd15f0eddd2) may try to finish it soon would close #82 #188 and...

The current implementation seems to fail when the URLs in a `.txt` input file have commas in them. This modification seems to fix the bug. (Disclaimer: I am not 100%...

https://www.reddit.com/r/StableDiffusion/comments/16v4ld8/25_million_creative_commons_image_dataset_released/?one_tap=true

Hi. When trying to download many images, I often noticed that the job seemed to not make progress anymore around the end. It could remain less than 1% of the...