scrapy-splash
scrapy-splash copied to clipboard
Cookies handling with render.html and SplashFormRequest.from_response
I'm using the endpoint render.html with SplashFormRequest.from_response for scraping asp.net based sites, but I can't make consecutive SplashFormRequest.from_response without losing session.
I tried to set Cookies in args, meta or cookiejar unsuccessfully, there's a part of my code:
def start_requests(self):
script = """
function main(splash, args)
splash:init_cookies(splash.args.cookies)
splash.images_enabled = false
splash:go(args.url)
splash:wait(3)
return {
html = splash:html(),
cookies = splash:get_cookies(),
}
end"""
request = SplashRequest(url=url, callback=self.parse, endpoint='execute',
args={'lua_source': script,
'url': url})
request.meta['splash']['session_id'] = self.session
yield request
def parse(self, response):
request = SplashFormRequest.from_response(response, url=url, formdata=data, callback=self.parse2, endpoint='render.html', args={'images': 0})
request.cookies = response.data['cookies']
request.meta['splash']['session_id'] = self.session
yield request
There's a way to make SplashFormRequest.from_response work setting cookies manually? Like a SplashFormRequest.from_response > SplashFormRequest.from_response ...?
Please, use StackOverflow to ask this type of questions.
Please improve documentation of how to actually set cookies for session handling. There is not a single example of how to actually pass cookies to a Splash Request.
May the issue be that you need to disable the private mode?