isds2020 icon indicating copy to clipboard operation
isds2020 copied to clipboard

Question regarding log and Selenium

Open mortenwurd opened this issue 3 years ago • 3 comments

My group is scraping a website for data every 60 seconds. We use Selenium and the driver.refresh() command inside a while-loop to update the webpage. It works fine however it doesn't add an entry into the log for every refresh. Is it possible to get a log file like the one from connector.get when using Selenium and driver.refresh()?

Thanks.

Morten

mortenwurd avatar Aug 17 '20 12:08 mortenwurd

hi @mortenwurd , yes :)

connector = Connector('log_file_refresh.csv',
                       connector_type = "selenium",
                       path2selenium = r"C:\Users\Joune\Desktop\chromedriver_win32\chromedriver.exe")
url_trustpilot = 'https://www.trustpilot.com/'
browser = connector.browser 
connector.get(url_trustpilot , 'first_call')

#refresh page and store meta data to log file 
for i in range(1, 6):
    connector.get(browser.current_url, f'refresh_{i}') 

pd.read_csv('log_file_refresh.csv', sep=";")

yields output: image

jsr-p avatar Aug 17 '20 14:08 jsr-p

Hi @jsr-p

Thanks! :)

Is it possible to get the Connector to run the browser headless? Normally we just type the following code snippet:

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=options)

Morten

mortenwurd avatar Aug 21 '20 19:08 mortenwurd

hi @mortenwurd , yes, here is a screenshot of the modified Connector class: image

And here is the code ready to be copied and pasted:

    if connector_type=='selenium':
      assert path2selenium!='', "You need to specify the path to you geckodriver if you want to use Selenium"
      from selenium import webdriver 
      ## HIN download the latest geckodriver here: https://github.com/mozilla/geckodriver/releases
      assert os.path.isfile(path2selenium),'You need to insert a valid path2selenium the path to your geckodriver. You can download the latest geckodriver here: https://github.com/mozilla/geckodriver/releases'
    
      ##################
      #headless options#
      ##################
      options = webdriver.ChromeOptions()
      options.add_argument('--ignore-certificate-errors')
      options.add_argument('--incognito')
      options.add_argument('--headless')
     
      
      #insert options parameter when making the object 
      self.browser = webdriver.Chrome(executable_path=path2selenium,
                                      options  = options) # start the browser with a path to the geckodriver.

jsr-p avatar Aug 22 '20 11:08 jsr-p