isds2020 icon indicating copy to clipboard operation
isds2020 copied to clipboard

Selenium and logging

Open peterlravn opened this issue 3 years ago • 7 comments

We are using Selenium to lives crape a static website over a couple of hours. In exercise 6, it said that we are supposed to log our data collection process in our final exam. Are we supposed to log our data collection when using Selenium? We don't repeatedly request a website, so I'm not sure how to log our data.

peterlravn avatar Aug 21 '20 09:08 peterlravn

hi @peterlravn , yes, you are supposed to log your data collection when using Selenium. The Connector class from the lecture will log each request that you make with Selenium automatically. A rule of thumb is to log each time you request a page and get some new HTML that you want to parse. How come it take a couple of hours to scrape 1 static page that you request once? :D

jsr-p avatar Aug 21 '20 10:08 jsr-p

We've scraped 5 hours of worth of data, but the requests have not been logged. The log has picked up lots of other requests where we didn't use Selenium. Can we 'write our way out of it' in our paper or should we collect the data once again?

jesperhauch avatar Aug 21 '20 11:08 jesperhauch

We have the same problem with logging, it just does not log what we are doing. Maybe we are doing something wrong? We have tried with both:

import scraping_class logfile = 'log_exam.txt' connector = scraping_class.Connector(logfile)

and

driver = webdriver.Chrome(executable_path="/Users/ninibertelsen/Downloads/chromedriver", service_args=["--verbose", "--log-path=exam.log"])

Can you see any mistakes, or is there something we have to do manually as well?

All the best, Nini

PS sorry to hijack this issue, but I thought it was silly to make another one about the exact same thing.

ninibertelsen avatar Aug 22 '20 10:08 ninibertelsen

hi everyone, it is important that you use the getmethod of the Connector class and not the get method of the webdriver.Chrome object. Consider the Connector class from the lectures. When using Selenium and then connector.get() the following method is used: image

The method also uses the get method of the webdriver.Chrome object. This is done in the line self.browser.get(url) # use selenium get method. But the difference here is that the following lines log the information to the log file. If you only use connector.browser.get() nothing will be written to the log file.

jsr-p avatar Aug 22 '20 11:08 jsr-p

@jesperhauch I would scrape the data again just to practice using the Connector class in the correct way. But you could probably also just incorporate it into the limitations of your study :)

jsr-p avatar Aug 22 '20 11:08 jsr-p

Thanks, I think the whole connector thing was very confusing, but I think I've got it now :-)

ninibertelsen avatar Aug 22 '20 11:08 ninibertelsen

Could you show an example for Selenium? What is 'self' supposed to be?

annalundsoe avatar Aug 24 '20 09:08 annalundsoe