scrapy-splash icon indicating copy to clipboard operation
scrapy-splash copied to clipboard

An example of http_method and body in splash script

Open lopuhin opened this issue 8 years ago • 4 comments

There are examples of using cookies in the docs, but no examples of setting method and body. I think it would be useful to add it, or perhaps even add the following class (with a better name): with it is possible to use full capabilities of scrapyjs without digging into splash scripts:

class DefaultExecuteSplashRequest(SplashRequest):
    '''
    This is a SplashRequest subclass that uses minimal default script
    for the execute endpoint with support for POST requests and cookies.
    '''
    SPLASH_SCRIPT = '''
    function last_response_headers(splash)
        local entries = splash:history()
        local last_entry = entries[#entries]
        return last_entry.response.headers
    end

    function main(splash)
        splash:init_cookies(splash.args.cookies)
        assert(splash:go{
            splash.args.url,
            headers=splash.args.headers,
            http_method=splash.args.http_method,
            body=splash.args.body,
            })
        assert(splash:wait(0.5))

        return {
            headers=last_response_headers(splash),
            cookies=splash:get_cookies(),
            html=splash:html(),
        }
    end
    '''

    def __init__(self, *args, **kwargs):
        kwargs['endpoint'] = 'execute'
        splash_args = kwargs.setdefault('args', {})
        splash_args['lua_source'] = self.SPLASH_SCRIPT
        super(DefaultExecuteSplashRequest, self).__init__(*args, **kwargs)

lopuhin avatar Apr 04 '16 14:04 lopuhin

Ah, this example is missing http_status support.

lopuhin avatar Apr 04 '16 14:04 lopuhin

Yeah, this makes sense.

For all other endpoints http_method and body work as-is, but for Lua script you have to implement it yourselves.

HTTP status code is handled for /execute since https://github.com/scrapy-plugins/scrapy-splash/commit/fa4f287cf2c2524ac1fa201e061e64e5b47d83bf, but in a very limited way - no response body, not headers, no cookies. You're right that it must be handled explicitly in a script to provide good experience.

As for DefaultExecuteSplashRequest, it looks related to https://github.com/scrapinghub/splash/issues/283; I wonder if we should provide a way to use scripts stored in separate .lua files in SplashExecuteRequest or SplashLuaRequest (or in SplashRequest directly).

kmike avatar Apr 04 '16 22:04 kmike

Ah, I missed that error support, this is nice!

I like the SplashExecuteRequest idea. Making it all composable looks really hard though.

lopuhin avatar Apr 05 '16 06:04 lopuhin

As for covering this in the documentation, shouldn’t it be done in the Splash documentation instead?

Gallaecio avatar Nov 26 '19 11:11 Gallaecio