Facing the issue too many requests with GoogleSearch.
With concurrent request to googlesearch, receiving the following:
642 def http_error_default(self, req, fp, code, msg, hdrs):
--> 643 raise HTTPError(req.full_url, code, msg, hdrs, fp)
HTTPError: HTTP Error 429: Too Many Requests
Any idea to add proxy to the google search?
can you share the code please?
@VinciGit00
You can replicate the above issue with the sample code below, the same issue we face using ScrapegraphAI with multiple requests:
import concurrent.futures
import time
from googlesearch import search
def fetch_url(query):
return list(search(query, stop=10))
def main():
query = "Weather in Pakistan"
batch_size = 50
res = []
with concurrent.futures.ThreadPoolExecutor(max_workers=batch_size) as executor:
future_to_url = {executor.submit(fetch_url, query): i for i in range(batch_size)}
for future in concurrent.futures.as_completed(future_to_url):
try:
urls = future.result()
res.append(urls)
except Exception as e:
print(f"Error fetching data: {e}")
return res
if __name__ == "__main__":
result = main()
print(len(result))
Need proxy to avoid the too many request issue.
ok but how do you integrate it with scrapegraph?
@VinciGit00, Basically, In scrapegraphai we are using google search, but we need to replace with the following to have the proxy as input parameter:
Package: googlesearch-python
from googlesearch import search
search(query, num_results=max_result, proxy = proxy)
ok I will update
ok pls update to the new beta