scrapy-splash icon indicating copy to clipboard operation
scrapy-splash copied to clipboard

Make certificates from original web request response available in SplashRequest

Open ned2 opened this issue 4 years ago • 1 comments

I'd like to be able to include the status of whether the response to the URL being scraped used SSL or not. The challenge is that inside the parse method of the SplashResponse the response.certificates attribute is populated by the SSL details of the Splash response, rather than from the original scraped URL response.

My understanding is that the magic_response=True param causes body, url and http_method attributes of the response object to be set to the values from the scraped URL response.

Is there currently a way to access the certificates attribute from the scraped URL response? Or would this need to be an extension of the magic_response functionality?

ned2 avatar Jun 27 '21 00:06 ned2

Is there currently a way to access the certificates attribute from the scraped URL response? Or would this need to be an extension of the magic_response functionality?

@ned2 I think the first step would be to make sure that information you need is available in the splash response - either you can fish it from har (see https://splash.readthedocs.io/en/stable/api.html#render-json har option) or you'll need to write a custom lua script (see https://github.com/scrapy-plugins/scrapy-splash#examples and splash docs) and get this information from splash. As I understand, the information returned from splash would be available in response.data even if magic response is used.

lopuhin avatar Jun 28 '21 10:06 lopuhin