googlesearch
googlesearch copied to clipboard
to many Requests
Hi is there a rate limit for requests? I got and to many requests error:
HTTP Error 429: Too Many Requests
I’m afraid so, Google does some heavy rate limiting with somewhat unclear rules. You can try either manually entering a captcha or just use another IP address when that happens.
ok thats not good, do you have me an example with the captcha? or can I buy an account for google search api to get more reqeuests
There is nothing in this Python script to bypass the captcha, you'd have to do it manually from a browser. Also this does not use the official API at all so there's no way to pay either. It's just a scraper. :)
I’m afraid so, Google does some heavy rate limiting with somewhat unclear rules. You can try either manually entering a captcha or just use another IP address when that happens.
Are there any ways to use proxy or provide configuration of a request before call the library? thank you in advance
There is, you can set the HTTP_PROXY and HTTPS_PROXY environment variables to define a proxy server. I also recommend to simulate a real user as much as possible, and not to go too deep into search result pages as this seems to be a trigger for the CAPTCHA as well.
May you have an example for http_proxy ?
Maybe I should explain what I do:
i extract search terms from pdfs and want to search for them. in my use case i e.g. process 20 pdfs each pdf exactly one search term and then start a google search with your API.
I get the first 10 search results and only the URL.
That is all I do.
Something like this:
import os
os.environ["HTTPS_PROXY"] = "http://localhost:8080"
Before doing the search should work. You can change the hostname and port number if your proxy is somewhere else.
Hello, you can use different user agent and set a random pause between 50,60. Then you'll get over the restriction.
(exemple with a different package but you can use this one the same way) https://github.com/fb76100/Scraping/blob/master/GoogleScraping
How do I set the proxy if the search is executed in a thread?
@reisenmachtfreude That is a fair question! You can try using the multiprocessing module instead of threads, you'll get better performance too. There is one very good use case for threads though, and it's when running on Windows systems. I don't have an easy solution for that one, though.
Thanks for your answer @MarioVilas I will try it out.
I am experiencing the same problem:
from googlesearch import search
for url in search('"Breaking Code" WordPress blog', stop=1):
print(url)
Results in HTTPError: HTTP Error 429: Too Many Requests
.
I have tried different IPs using VPN but keep getting the error even when running the code the first time.
Too many requests error, after updating the package I had issues with urllib2 so had to manually change that, also got an error on line 39 "total text" so had to remove the [0] index