bing_image_downloader icon indicating copy to clipboard operation
bing_image_downloader copied to clipboard

Add page limit option

Open milanvarady opened this issue 2 years ago • 6 comments

I've noticed when downloading over 30 images or so, sometimes it just can't find more, and it keeps indexing the pages without any success. To counter this, I added a page_limit option that limits the number of pages it indexes. I changed the README as well to include this option, and I also added some prints to show whether it stopped because of the download limit or the page limit.

milanvarady avatar Jul 04 '22 09:07 milanvarady

This also fixes issue #3.

milanvarady avatar Jul 04 '22 09:07 milanvarady

So I have a question for you. I made a fork of this, and I've been putting in improvements to it, and I'm debating adding this in. However, I've been doing large downloads as you mentioned 50+, and each time since I've been getting a list of queries I've just let it go over night, and eventually it has either found enough images or reached an "out of links" error. Is this not your experience? If so would you mind sharing the query that stalls it out?

ghost avatar Aug 23 '22 17:08 ghost

I found that it stalls every time, the query doesn't really matter. You mentioned that you let it run overnight, this is good if you want to do a single query and have time, but I found that you can get the best results if you run multiple queries with variations. For instance, if I want to make a rabbit dataset I would run the program multiple times with queries like this: rabbit, domestic rabbit, bunny, white rabbit, black rabbit, baby rabbit, European rabbit, etc. With this method, it is key that it finishes in a reasonable amount of time.

milanvarady avatar Aug 23 '22 18:08 milanvarady

Yeah that's not an issue for me as when using the downloader I read a csv of queries in and then iterate the list with a download command for each. So for me it's just put it a list of 300 or so queries set it for 50 to 100 images per query and it finishes when it finishes.

ghost avatar Aug 23 '22 21:08 ghost

I mean ultimately I can make some changes and make this optional, so if time matters for someone they can turn it on if not then leave it off.

milanvarady avatar Aug 24 '22 07:08 milanvarady

Don't worry about making the edits, I wasn't trying to get you or anyone, to do any more "work". I was just trying to get a better understanding of what you were saying as I'm fairly new to doing this. I think I may put it in as an optional parameter as you just said. I just wanted to understand what you were seeing first.

ghost avatar Aug 24 '22 11:08 ghost