scrapy-selenium icon indicating copy to clipboard operation
scrapy-selenium copied to clipboard

How to perform a click button with scrapy-selenium?

Open Houssemaster opened this issue 3 years ago • 6 comments

Hello, i want to make some actions after getting response from page like clicking, hovering scrolling etc..

Houssemaster avatar Jan 14 '21 13:01 Houssemaster

Requests have an additional meta key, named driver containing the selenium driver with the request processed. You can perform those actions with it like:

class WhateverSpider(scrapy.Spider):
	def start_requests(self):
		urls = ['www.google.com']
		for url in urls:
			yield SeleniumRequest(
				url = url,
				callback = self.parse,
				wait_time = 10)

	def parse(self, response):
		driver = response.request.meta['driver']
		# Do some stuff..
		# Click a button. 
		button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
		button.click()		
		# Do more stuff

alephsis avatar Jan 29 '21 21:01 alephsis

Requests have an additional meta key, named driver containing the selenium driver with the request processed. You can perform those actions with it like:

class WhateverSpider(scrapy.Spider):
	def start_requests(self):
		urls = ['www.google.com']
		for url in urls:
			yield SeleniumRequest(
				url = url,
				callback = self.parse,
				wait_time = 10)

	def parse(self, response):
		driver = response.request.meta['driver']
		# Do some stuff..
		# Click a button. 
		button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
		button.click()		
		# Do more stuff

Hello, I think your solution solved part of the problem. However, there is still a problem with this snippet of code since downloading requests and parsing responses are asynchronous in scrapy. Thus, it is possible that scrapy invoked

driver.get(another_url)

in the middleware's process_request method before scrapy reaching the line:

driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')

which means at the time scrapy reached that line, the page source may have been changed.

rogerlin0330 avatar Feb 04 '21 20:02 rogerlin0330

This will case some problem, while the code are asynchronous.

But there is another solution. You could use the request option wait_until to perform some action like this:

def some_action(driver):
    if wait_until_conditions:
        driver.find_element(By.CLASS_NAME, '.klass')
        ……
       return True

SeleniumRequest(
            url='http://xxx.ofg',
            wait_until=some_action
        )

# if you forget to return True in wait_until callback, This code would run again and again.

zjonejj avatar Mar 19 '21 06:03 zjonejj

Hello, i want to make some actions after getting response from page like clicking, hovering scrolling etc..

I have the same requirement. you can check this repo before the pull request accepted.

zjonejj avatar Mar 19 '21 09:03 zjonejj

Requests have an additional meta key, named driver containing the selenium driver with the request processed. You can perform those actions with it like:

class WhateverSpider(scrapy.Spider):
	def start_requests(self):
		urls = ['www.google.com']
		for url in urls:
			yield SeleniumRequest(
				url = url,
				callback = self.parse,
				wait_time = 10)

	def parse(self, response):
		driver = response.request.meta['driver']
		# Do some stuff..
		# Click a button. 
		button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
		button.click()		
		# Do more stuff

Hello, I think your solution solved part of the problem. However, there is still a problem with this snippet of code since downloading requests and parsing responses are asynchronous in scrapy. Thus, it is possible that scrapy invoked

driver.get(another_url)

in the middleware's process_request method before scrapy reaching the line:

driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')

which means at the time scrapy reached that line, the page source may have been changed.

You are right. There is only one drive. So response.request.meta['driver'] is dealing with the current url which is different from response.url. See #22 Any solution to this?

xtan9 avatar Aug 18 '21 19:08 xtan9

get_element_by_xpath change to find_element_by_xpath

ppeer avatar Jan 06 '22 16:01 ppeer