bing_image_downloader
bing_image_downloader copied to clipboard
Add page limit option
I've noticed when downloading over 30 images or so, sometimes it just can't find more, and it keeps indexing the pages without any success. To counter this, I added a page_limit
option that limits the number of pages it indexes. I changed the README as well to include this option, and I also added some prints to show whether it stopped because of the download limit or the page limit.
This also fixes issue #3.
So I have a question for you. I made a fork of this, and I've been putting in improvements to it, and I'm debating adding this in. However, I've been doing large downloads as you mentioned 50+, and each time since I've been getting a list of queries I've just let it go over night, and eventually it has either found enough images or reached an "out of links" error. Is this not your experience? If so would you mind sharing the query that stalls it out?
I found that it stalls every time, the query doesn't really matter. You mentioned that you let it run overnight, this is good if you want to do a single query and have time, but I found that you can get the best results if you run multiple queries with variations. For instance, if I want to make a rabbit dataset I would run the program multiple times with queries like this: rabbit, domestic rabbit, bunny, white rabbit, black rabbit, baby rabbit, European rabbit, etc. With this method, it is key that it finishes in a reasonable amount of time.
Yeah that's not an issue for me as when using the downloader I read a csv of queries in and then iterate the list with a download command for each. So for me it's just put it a list of 300 or so queries set it for 50 to 100 images per query and it finishes when it finishes.
I mean ultimately I can make some changes and make this optional, so if time matters for someone they can turn it on if not then leave it off.
Don't worry about making the edits, I wasn't trying to get you or anyone, to do any more "work". I was just trying to get a better understanding of what you were saying as I'm fairly new to doing this. I think I may put it in as an optional parameter as you just said. I just wanted to understand what you were seeing first.