yagooglesearch icon indicating copy to clipboard operation
yagooglesearch copied to clipboard

Search always return an empty result list

Open rombru opened this issue 1 year ago • 5 comments

Hello, I'm using the version 1.10.0 of the package (Python version 3.12), on Windows, from Belgium. Each time I'm calling the search() function, it returns an empty result list. When I try in my browser, it's working well and it does return some results. And when I try with the package https://github.com/MarioVilas/googlesearch, it's working too.

I managed to reproduce the issue by opening the link in a private window and noticed that it was because the content of the page is : image image

I found my problem similar to the issue #5 , but not exactly the same. I guess this has something to do with cookies but don't really know how to solve it. I tried with multiple configuration of the SearchClient but it's always the same problem.

Here are the logs.txt

Do you have an idea ?

rombru avatar May 01 '24 20:05 rombru

Hi @rombru - apologies it took a few days to answer back. Can you provide me the entire command and switches you used?

opsdisk avatar May 09 '24 22:05 opsdisk

Here is the code:

import yagooglesearch

client = yagooglesearch.SearchClient(
    "Paris",
    tld="com",
    lang_html_ui="fr",
    lang_result="lang_fr",
    tbs="li:1",
    max_search_result_urls_to_return=20,
    http_429_cool_off_time_in_minutes=45,
    http_429_cool_off_factor=1.5,
    verbosity=5,
    verbose_output=True,
)
client.assign_random_user_agent()

results = []

for result in client.search():
    print(result)

And here are the logs:

2024-05-10 15:13:24,125 [MainThread  ] [INFO] Requesting URL: https://www.google.com/
2024-05-10 15:13:24,885 [MainThread  ] [DEBUG]     status_code: 200
2024-05-10 15:13:24,885 [MainThread  ] [DEBUG]     headers: {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.4) Gecko/20091007 Firefox/3.5.4'}
2024-05-10 15:13:24,885 [MainThread  ] [DEBUG]     cookies: <RequestsCookieJar[<Cookie SOCS=CAAaBgiAx_WxBg for .google.com/>, <Cookie AEC=AQTF6Hx69P_kYG3EtIvcfGwbu_B-BX2NuoUD64fZgXUxLQmc99S60GpfTw for .google.com/>, <Cookie __Secure-ENID=19.SE=lY6fEcOnWjImUW4gHGpjFStmEmqTMePJ1iKBNVDNHgWYxXhgKbsAHfYv5no0t2F09H3rVAwBLp6dMbnXnEnLf5wj1oTxwrVRCPFfepWLhxAVEATkWO5q1x14qQULH8a1HndOsGPGfDIhWymH_kBJfZdsEWKHZa_hxTSmlVtzGqN7Gg73afgOD3ogSw for .google.com/>]>
2024-05-10 15:13:24,887 [MainThread  ] [DEBUG]     proxy: 
2024-05-10 15:13:24,887 [MainThread  ] [DEBUG]     verify_ssl: True
2024-05-10 15:13:24,888 [MainThread  ] [INFO] Stats: start=0, num=100, total_valid_links_found=0 / max_search_result_urls_to_return=20
2024-05-10 15:13:24,888 [MainThread  ] [INFO] Requesting URL: https://www.google.com/search?hl=fr&lr=lang_fr&q=Paris&num=100&btnG=Google+Search&tbs=li:1&safe=off&cr=&filter=0
2024-05-10 15:13:25,828 [MainThread  ] [DEBUG]     status_code: 200
2024-05-10 15:13:25,828 [MainThread  ] [DEBUG]     headers: {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.4) Gecko/20091007 Firefox/3.5.4'}
2024-05-10 15:13:25,828 [MainThread  ] [DEBUG]     cookies: <RequestsCookieJar[]>
2024-05-10 15:13:25,828 [MainThread  ] [DEBUG]     proxy: 
2024-05-10 15:13:25,828 [MainThread  ] [DEBUG]     verify_ssl: True
2024-05-10 15:13:25,835 [MainThread  ] [DEBUG] pre filter_search_result_urls() link: https://accounts.google.com/ServiceLogin?hl=fr&continue=https://www.google.com/search?hl%3Dfr%26lr%3Dlang_fr%26q%3DParis%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gae=cb-none
2024-05-10 15:13:25,835 [MainThread  ] [DEBUG] Excluding URL because it contains "google": https://accounts.google.com/ServiceLogin?hl=fr&continue=https://www.google.com/search?hl%3Dfr%26lr%3Dlang_fr%26q%3DParis%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gae=cb-none
2024-05-10 15:13:25,835 [MainThread  ] [DEBUG] post filter_search_result_urls() link: None
2024-05-10 15:13:25,835 [MainThread  ] [DEBUG] pre filter_search_result_urls() link: https://accounts.google.com/ServiceLogin?hl=fr&continue=https://www.google.com/search?hl%3Dfr%26lr%3Dlang_fr%26q%3DParis%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gae=cb-none
2024-05-10 15:13:25,835 [MainThread  ] [DEBUG] Excluding URL because it contains "google": https://accounts.google.com/ServiceLogin?hl=fr&continue=https://www.google.com/search?hl%3Dfr%26lr%3Dlang_fr%26q%3DParis%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gae=cb-none
2024-05-10 15:13:25,835 [MainThread  ] [DEBUG] post filter_search_result_urls() link: None
2024-05-10 15:13:25,835 [MainThread  ] [DEBUG] pre filter_search_result_urls() link: https://policies.google.com/technologies/cookies?hl=fr&utm_source=ucb
2024-05-10 15:13:25,835 [MainThread  ] [DEBUG] Excluding URL because it contains "google": https://policies.google.com/technologies/cookies?hl=fr&utm_source=ucb
2024-05-10 15:13:25,836 [MainThread  ] [DEBUG] post filter_search_result_urls() link: None
2024-05-10 15:13:25,836 [MainThread  ] [DEBUG] pre filter_search_result_urls() link: https://consent.google.com/dl?continue=https://www.google.com/search?hl%3Dfr%26lr%3Dlang_fr%26q%3DParis%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gl=NL&hl=fr&cm=2&pc=srp&uxe=none&src=1
2024-05-10 15:13:25,836 [MainThread  ] [DEBUG] Excluding URL because it contains "google": https://consent.google.com/dl?continue=https://www.google.com/search?hl%3Dfr%26lr%3Dlang_fr%26q%3DParis%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gl=NL&hl=fr&cm=2&pc=srp&uxe=none&src=1
2024-05-10 15:13:25,836 [MainThread  ] [DEBUG] post filter_search_result_urls() link: None
2024-05-10 15:13:25,836 [MainThread  ] [DEBUG] pre filter_search_result_urls() link: https://policies.google.com/privacy?hl=fr&utm_source=ucb
2024-05-10 15:13:25,836 [MainThread  ] [DEBUG] Excluding URL because it contains "google": https://policies.google.com/privacy?hl=fr&utm_source=ucb
2024-05-10 15:13:25,836 [MainThread  ] [DEBUG] post filter_search_result_urls() link: None
2024-05-10 15:13:25,836 [MainThread  ] [DEBUG] pre filter_search_result_urls() link: https://policies.google.com/terms?hl=fr&utm_source=ucb
2024-05-10 15:13:25,836 [MainThread  ] [DEBUG] Excluding URL because it contains "google": https://policies.google.com/terms?hl=fr&utm_source=ucb
2024-05-10 15:13:25,836 [MainThread  ] [DEBUG] post filter_search_result_urls() link: None
2024-05-10 15:13:25,836 [MainThread  ] [INFO] No valid search results found on this page.  Moving on...

rombru avatar May 10 '24 13:05 rombru

Thanks for that.

  1. I'm getting results with the pastables you provided from a US IP, but just to check, can you try again with these?
import yagooglesearch

query = "Paris"

client = yagooglesearch.SearchClient(
    query,
    tbs="li:1",
    max_search_result_urls_to_return=20,
    http_429_cool_off_time_in_minutes=45,
    http_429_cool_off_factor=1.5,
    verbosity=5,
    verbose_output=True,
)
client.assign_random_user_agent()

urls = client.search()

len(urls)

for url in urls:
    print(url)
  1. Your source IP is in a European country. There's been some issues with this in the past so https://github.com/opsdisk/yagooglesearch/blob/master/src/yagooglesearch/init.py#L374 was added. Are you able to source the search from a different IP (through a VPS, SSH tunnel, VPN, etc.)?

  2. If you're familiar with inspecting network traffic in browser dev tools (https://developer.chrome.com/docs/devtools/network), you can inspect the cookie value looking for GOOGLE_ABUSE_EXEMPTION, copy/paste that string, and pass it to google_exemption when instantiating yagooglesearch.SearchClient

  3. Looks like googlesearch is accessing the local cookie jar (https://github.com/MarioVilas/googlesearch/blob/master/googlesearch/init.py#L89) and would possibly use the cookies you got from the screenshot you provided when you accepted the terms. If your comfortable in Python, you could try adding code to support that in yagooglesearch. yagooglesearch uses Python requests though so it'd take some research https://requests.readthedocs.io/en/latest/user/quickstart/#cookies

opsdisk avatar May 11 '24 14:05 opsdisk

Thanks, I already tried several options:

  1. It doesn't work, and give me the same empty result list.
  2. Tried with my US VPN, and it does work
  3. Haven't been able to find that cookie, since it's not a captcha page I'm not sure if Google set that kind of cookie. From what I saw, I only have a "SOCS" cookie.

Will probably tried to look a bit more into it when I'll have some time

rombru avatar May 12 '24 12:05 rombru

I haven't run into a "SOCS" cookie yet. Would love to see a screenshot or pastable of what's in it. I wish the library wasn't geolocation dependent in order to work correctly, and I hate saying "Just your US VPN", but that may be the easiest solution for you right now.

opsdisk avatar May 24 '24 02:05 opsdisk