scrapy-splash
scrapy-splash copied to clipboard
An example of http_method and body in splash script
There are examples of using cookies in the docs, but no examples of setting method and body. I think it would be useful to add it, or perhaps even add the following class (with a better name): with it is possible to use full capabilities of scrapyjs without digging into splash scripts:
class DefaultExecuteSplashRequest(SplashRequest):
'''
This is a SplashRequest subclass that uses minimal default script
for the execute endpoint with support for POST requests and cookies.
'''
SPLASH_SCRIPT = '''
function last_response_headers(splash)
local entries = splash:history()
local last_entry = entries[#entries]
return last_entry.response.headers
end
function main(splash)
splash:init_cookies(splash.args.cookies)
assert(splash:go{
splash.args.url,
headers=splash.args.headers,
http_method=splash.args.http_method,
body=splash.args.body,
})
assert(splash:wait(0.5))
return {
headers=last_response_headers(splash),
cookies=splash:get_cookies(),
html=splash:html(),
}
end
'''
def __init__(self, *args, **kwargs):
kwargs['endpoint'] = 'execute'
splash_args = kwargs.setdefault('args', {})
splash_args['lua_source'] = self.SPLASH_SCRIPT
super(DefaultExecuteSplashRequest, self).__init__(*args, **kwargs)
Ah, this example is missing http_status
support.
Yeah, this makes sense.
For all other endpoints http_method
and body
work as-is, but for Lua script you have to implement it yourselves.
HTTP status code is handled for /execute
since https://github.com/scrapy-plugins/scrapy-splash/commit/fa4f287cf2c2524ac1fa201e061e64e5b47d83bf, but in a very limited way - no response body, not headers, no cookies. You're right that it must be handled explicitly in a script to provide good experience.
As for DefaultExecuteSplashRequest, it looks related to https://github.com/scrapinghub/splash/issues/283; I wonder if we should provide a way to use scripts stored in separate .lua files in SplashExecuteRequest or SplashLuaRequest (or in SplashRequest directly).
Ah, I missed that error support, this is nice!
I like the SplashExecuteRequest idea. Making it all composable looks really hard though.
As for covering this in the documentation, shouldn’t it be done in the Splash documentation instead?