Google-Images-Search
Google-Images-Search copied to clipboard
Search randomly fails with Http error.
I am trying to download 200 images of a given object. Here is the configuration of the search header:
_search_params = {
'q': keyword,
'num': quantity,
# 'fileType': 'jpg',
# 'rights': 'cc_nonderived',
# 'safe': 'medium', ##
'imgType': 'photo', ##
'imgSize': 'imgSizeUndefined', ##
'imgDominantColor': 'imgDominantColorUndefined',
##
'imgColorType': 'imgColorTypeUndefined' ##
}
Here is the error it throws:
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://customsearch.googleapis.com/customsearch/v1?cx=***&q=door&searchType=image&num=10&start=201&imgType=photo&imgSize=imgSizeUndefined&safe=off&imgDominantColor=imgDominantColorUndefined&imgColorType=imgColorTypeUndefined&key=***&alt=json returned "Request contains an invalid argument.". Details: "Request contains an invalid argument.">
I am not sure what is wrong and I need this to work reliably. Is there a way I can just catch the error and move on? It downloads like 145 images and then just chokes. The thing is that it chokes after downloading exactly 145 images.
Note: I intentionally censored the CX ID and the API key. Those are replaced with the correct ones on the real code.
I tried a different search term and it exited after downloading 117 images.
Hi @DragonflyRobotics
This is now well known limitation from Google search API when sum of start
and num
query parameters is bigger then 100:
https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list
Frankly, I don't know how to tackle this except by making a friendly exception or warning or something simillar.
Somehow, I was able to download 100 images easily. It choked after 110 or 120 images.
Yes, that limit is a pain. Will investigate this further.
I will also try researching and assisting with this issue. I found your repo incredibly useful in my project.
I have been messing around with GIS some more. I found that it doesn't stop at exactly at 100. Furthermore, it downloads more images for some keywords and less for others. I think it might not have to do with the Google download cap.
Hi @DragonflyRobotics
Not all images out there are valid and good to download. A lot of them are plain unreachable, producing error 4xx and higher. That is why some of the keywords download more and some less images because this lib validates its availability prior to downloading.
There is nothing more to this lib. If it wasn't for this Google API's limit, this lib would download thousand images without stopping.
And it stops with "Request contains an invalid argument." error by Google, using the same arguments as before the error.
I've tested it again now with num=200
, and looks like the start + num > 100 limit doesn't work at all.
API goes beyond 100 limit point just fine.
But once a start argument surpasses 200, you get the "Request contains an invalid argument." and the invalid part is the start
argument being bigger than 200.
There is no other explanation. Nothing else changes from request to request.
If there is a hard limit in the Google API of the start
argument being <= 200, maybe simply return when that limit is exceeded before making the new request? It is kind of a downer of course, when you just get what you can, I know. It's better though than having it throwing an exception.
I am looking to search through batches of different images, so I would rather not have the process crash out (though I do plan to handle the exception in my code and move on to the next query in the queue I guess.)
I think that is a good idea. We can simply programmatically run until the <200 flag is reached. Then we can just stop the search instance, make a new one, and continue downloading.
The problem here is you simply cannot get more than 200 different images with one search query.
When you reach start + num > 200
, game's over. Use different query term.
That's not my rule, it's Google's.
And I don't thing silent fail in that case is a good idea. Everyone should be aware of this limit and handle it for them selfs.
The problem is if the last query has parameters like start=193
and num=5
, which goes beyond 200 limit, it will fail before getting any image.
So my idea is when that happen, to correct num
parameter in a way not to go beyond 200 when summed with start
param, and throw an exception afterwards.
In that case you are aware of the limitation and have your images as well.
And you code should look like this:
from google_images_search.exception import GoogleLimit
try:
gis.search(...)
except GoogleLimit:
pass
for image in gis.results():
pass