WWWE icon indicating copy to clipboard operation
WWWE copied to clipboard

potential fix for google_search()

Open bowditch-c opened this issue 6 years ago • 5 comments

A suggested fix for google_search()

bowditch-c avatar Aug 28 '19 05:08 bowditch-c

I tried the pull-request and it doesn't seem to work for me. Every email I tested gets reported as not found in the google search results which is not the case!

chrispetrou avatar Aug 28 '19 09:08 chrispetrou

The major change is that the function now performs a google search using quotes, e.g “[email protected]”. It will search for that email exactly as typed. It works for me! Any public facing email addresses return results, whilst private emails don’t. If that’s not exactly the intended function, my apologies.

bowditch-c avatar Aug 28 '19 10:08 bowditch-c

The function does pretty match what you described but when I test the following script using your pull-request:

import os, sys
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

os.environ['MOZ_HEADLESS'] = '1'
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True

def google_search(email):
    endpoint = 'https://google.com/search?q=%22{}%22'.format(email)
    try:
        with webdriver.Firefox(capabilities=cap) as d:
            d.get(endpoint)
            if "No results found" or "did not match any documents" in d.page_source:
                return False
            else:
                return True
    except Exception as error:
        raise(error)

try:
    email = sys.argv[1]
    breached = google_search(email)
    if breached:
        print("{} shows up on google search results".format(email))
    else:
        print("{} doesn't show up on google search results.".format(email))
except IndexError:
    sys.exit(0)

I get positive (by positive I mean not showing up in google search results) results for every email I test. When I use your method manually it works but through that script it doesn't for some reason. I've tried it even for very simple emails that have been in thousands breaches and it keeps reporting them as safe...

chrispetrou avatar Aug 28 '19 10:08 chrispetrou

The patch ignores a race condition. Google's search is rendered via Javascript and the script does not make sure that it waits for the DOM to have been assembled before trying to read from it.

cf. https://selenium-python.readthedocs.io/waits.html

jsfan avatar Aug 29 '19 09:08 jsfan

Aha! Excellent catch. Thank you! I was stumped. I couldn’t recreate the issue on my end with my set of test emails. An explicit wait should resolve this issue.

bowditch-c avatar Aug 29 '19 09:08 bowditch-c