googlesearch to many Requests

to many Requests

Open holzfelix opened this issue 3 years ago • 13 comments

Hi is there a rate limit for requests? I got and to many requests error:

HTTP Error 429: Too Many Requests

Aug 24 '20 09:08 holzfelix

I’m afraid so, Google does some heavy rate limiting with somewhat unclear rules. You can try either manually entering a captcha or just use another IP address when that happens.

Aug 24 '20 10:08 MarioVilas

ok thats not good, do you have me an example with the captcha? or can I buy an account for google search api to get more reqeuests

Aug 24 '20 10:08 holzfelix

There is nothing in this Python script to bypass the captcha, you'd have to do it manually from a browser. Also this does not use the official API at all so there's no way to pay either. It's just a scraper. :)

Aug 24 '20 14:08 MarioVilas

I’m afraid so, Google does some heavy rate limiting with somewhat unclear rules. You can try either manually entering a captcha or just use another IP address when that happens.

Are there any ways to use proxy or provide configuration of a request before call the library? thank you in advance

Aug 26 '20 08:08 silantev-ai

There is, you can set the HTTP_PROXY and HTTPS_PROXY environment variables to define a proxy server. I also recommend to simulate a real user as much as possible, and not to go too deep into search result pages as this seems to be a trigger for the CAPTCHA as well.

Aug 26 '20 11:08 MarioVilas

May you have an example for http_proxy ?

Maybe I should explain what I do:

i extract search terms from pdfs and want to search for them. in my use case i e.g. process 20 pdfs each pdf exactly one search term and then start a google search with your API.

I get the first 10 search results and only the URL.

That is all I do.

Aug 26 '20 11:08 holzfelix

Something like this:

import os
os.environ["HTTPS_PROXY"] = "http://localhost:8080"

Before doing the search should work. You can change the hostname and port number if your proxy is somewhere else.

Aug 26 '20 12:08 MarioVilas

Hello, you can use different user agent and set a random pause between 50,60. Then you'll get over the restriction.

(exemple with a different package but you can use this one the same way) https://github.com/fb76100/Scraping/blob/master/GoogleScraping

Sep 22 '20 12:09 fb76100

How do I set the proxy if the search is executed in a thread?

May 31 '21 07:05 reisenmachtfreude

@reisenmachtfreude That is a fair question! You can try using the multiprocessing module instead of threads, you'll get better performance too. There is one very good use case for threads though, and it's when running on Windows systems. I don't have an easy solution for that one, though.

Jun 02 '21 17:06 MarioVilas

Thanks for your answer @MarioVilas I will try it out.

Jun 02 '21 18:06 reisenmachtfreude

I am experiencing the same problem:

from googlesearch import search

for url in search('"Breaking Code" WordPress blog', stop=1):
    print(url)

Results in HTTPError: HTTP Error 429: Too Many Requests.

I have tried different IPs using VPN but keep getting the error even when running the code the first time.

Jan 08 '22 18:01 ThisIsManuel

Too many requests error, after updating the package I had issues with urllib2 so had to manually change that, also got an error on line 39 "total text" so had to remove the [0] index

Jan 16 '22 08:01 ihamquentin

googlesearch googlesearch copied to clipboard

to many Requests

googlesearch
googlesearch copied to clipboard