GHunt icon indicating copy to clipboard operation
GHunt copied to clipboard

TimeoutException scraping Google Maps reviews

Open shafe123 opened this issue 3 years ago • 8 comments

Describe the bug GHunt encounters a Selenium TimeoutException when trying to pull review contributions from Google Maps.

To Reproduce Steps to reproduce the behavior:

  1. Cloned repo
  2. Installed prereqs via pip install
  3. Used Edge extension and method 1 for cookies
  4. Tool successfully pulled email, gaia ID, contact email, contact phones, services, and channel
  5. Started pulling review contributions then hung.

System (please complete the following information):

  • Windows 10
  • Python version 3.9.6

Additional context

I can provide my personal email address that I used to test privately.

Traceback (most recent call last):
  File "C:\Users\user\Tools\GHunt\ghunt.py", line 38, in <module>
    email_hunt(data)
  File "C:\Users\user\Tools\GHunt\modules\email.py", line 200, in email_hunt
    reviews = gmaps.scrape(gaiaID, client, cookies, config, config.headers, config.regexs["review_loc_by_id"], config.headless)
  File "C:\Users\user\Tools\GHunt\lib\gmaps.py", line 80, in scrape
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'div.section-scrollbox')))
  File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\support\wait.py", line 89, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
Backtrace:
        Ordinal0 [0x009A7413+2389011]
        Ordinal0 [0x00939F61+1941345]
        Ordinal0 [0x0082C658+837208]
        Ordinal0 [0x008591DD+1020381]
        Ordinal0 [0x0085949B+1021083]
        Ordinal0 [0x00886032+1204274]
        Ordinal0 [0x00874194+1130900]
        Ordinal0 [0x00884302+1196802]
        Ordinal0 [0x00873F66+1130342]
        Ordinal0 [0x0084E546+976198]
        Ordinal0 [0x0084F456+980054]
        GetHandleVerifier [0x00B59632+1727522]
        GetHandleVerifier [0x00C0BA4D+2457661]
        GetHandleVerifier [0x00A3EB81+569713]
        GetHandleVerifier [0x00A3DD76+566118]
        Ordinal0 [0x00940B2B+1968939]
        Ordinal0 [0x00945988+1989000]
        Ordinal0 [0x00945A75+1989237]
        Ordinal0 [0x0094ECB1+2026673]
        BaseThreadInitThunk [0x756AFA29+25]
        RtlGetAppContainerNamedObjectPath [0x77227A7E+286]
        RtlGetAppContainerNamedObjectPath [0x77227A4E+238]

shafe123 avatar Apr 20 '22 15:04 shafe123

In lib\gmaps.py, I surrounded lines 79-129 with a try/except block, which is a bandaid for now. If I get some time I'll take a closer look and see if I can help out. My initial guess is that the div.section-scrollbox does not exist on the page anymore.

shafe123 avatar Apr 21 '22 12:04 shafe123

Well it just so happens that I couldn't get it out of my head. I reworked the gmaps file so that it finds the appropriate elements now. It works by searching for the span that contains "review" and "rating", so I'm not sure if it will work in languages other than English or not. I've attached the diff below.

gmaps_diff.txt

shafe123 avatar Apr 21 '22 13:04 shafe123

Hello, I'm looking at this right now.

mxrch avatar Apr 21 '22 18:04 mxrch

Fixed in the latest commit ! Thank you for your patch @shafe123, it works very well ! I also included you as co-author of the commit :) And don't worry about the language when searching for "review" and "rating", I force the english language by adding "hl=en" to the maps reviews url.

mxrch avatar Apr 21 '22 18:04 mxrch

Reopening this issue since a user on discord had another error after the patch

mxrch avatar Apr 21 '22 19:04 mxrch

Traceback (most recent call last): File "/home/kali/Desktop/GHunt/ghunt.py", line 38, in email_hunt(data) File "/home/kali/Desktop/GHunt/modules/email.py", line 200, in email_hunt reviews = gmaps.scrape(gaiaID, client, cookies, config, config.headers, config.regexs["review_loc_by_id"], config.headless) File "/home/kali/Desktop/GHunt/lib/gmaps.py", line 79, in scrape tab_info = driver.find_element(by=By.XPATH, value="//span[contains(@aria-label, 'review') and contains(@aria-label, 'rating')]") File "/home/kali/.local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 1248, in find_element return self.execute(Command.FIND_ELEMENT, { File "/home/kali/.local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 425, in execute self.error_handler.check_response(response) File "/home/kali/.local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//span[contains(@aria-label, 'review') and contains(@aria-label, 'rating')]"} (Session info: headless chrome=100.0.4896.127) Stacktrace: #0 0x563a5ff17ad3 #1 0x563a5fc77568 #2 0x563a5fcadc46 #3 0x563a5fcade01 #4 0x563a5fce0a64 #5 0x563a5fccb61d #6 0x563a5fcde824 #7 0x563a5fccb4e3 #8 0x563a5fca0d1a #9 0x563a5fca1e75 #10 0x563a5ff45efd #11 0x563a5ff5f19b #12 0x563a5ff47c65 #13 0x563a5ff5fec8 #14 0x563a5ff3b360 #15 0x563a5ff7ba68 #16 0x563a5ff7bbe8 #17 0x563a5ff957fd #18 0x7f45c45cfeae

I am experiencing this error, I believe whenever the tool reaches the scraping of google maps reviews part and when there are google map reviews in the particular account, the error message above will appear.

juggel90 avatar Apr 22 '22 01:04 juggel90

I managed to find a temporary workaround after chatting with @mxrch (thank you for that) and get the google reviews showing without any errors!

You will require python 3.10/3.11

sudo apt install software-properties-common -y
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt install python3.10 -y

python3.10 -m venv .venv (create an environment for python3.10) 
source .venv/bin/activate 
python3 -m pip install -r requirements.txt --upgrade

git checkout refactor (Make sure you have ghunt-main downloaded first)
git pull
python main.py -h (View the user manual first)
python main.py login (Grab the cookie first)
python main.py address <email_address>

Done 👍

juggel90 avatar Apr 23 '22 17:04 juggel90

Seems like it freeze at 85 reviews for me

dadodasyra avatar Jun 30 '22 20:06 dadodasyra