scrapy-splash icon indicating copy to clipboard operation
scrapy-splash copied to clipboard

scrapy-splash doesn't render the page, but splash does

Open Ostapp opened this issue 5 years ago • 2 comments

Scrapy-splash does not render the page fully, even though Splash alone does. I want to render contents of a table with id "grid". I can see that its rendered correctly in splash browser by http://localhost:8050/info?wait=0.5&images=1&expand=1&timeout=90.0&url=https%3A%2F%2Fwww.homeinspector.org%2FHomeInspectors%2FFind%2FResults%3FMetroAreaID%3D5%26NeighborhoodID%3D%26SearchType%3DMetroArea&lua_source=function+main%28splash%2C+args%29%0D%0A++assert%28splash%3Ago%28args.url%29%29%0D%0A++assert%28splash%3Await%280.5%29%29%0D%0A++return+splash%3Ahtml%28%29%0D%0Aend

However, its not rendered at all in my scrapy spider. The grid is empty.

I am talking about parse_find_page function

script="""
function main(splash, args)
splash.private_mode_enabled = false
splash.plugins_enabled = true
splash.indexeddb_enabled = true
splash.html5_media_enabled = true
assert(splash:go(splash.args.url)})
assert(splash:wait(7))
splash:set_viewport_full()

return splash:html()
end
"""

class Homeinspectors(CrawlSpider):

name = 'instructors'


def start_requests(self):

    return [SplashRequest('https://www.homeinspector.org/HomeInspectors/Find', self.parse_find_page,
        args={
            'wait': 5,
            'http_method': 'GET',
            'timeout':30
            },
    )]

def parse_find_page (self, response):

    values = response.xpath('//select[@id="ddlMetroArea"]/option/@value').extract()[1:]
    url = "https://www.homeinspector.org/HomeInspectors/Find/Results?MetroAreaID={}NeighborhoodID=&SearchType=MetroArea"
    for v in values:
        yield SplashRequest(url.format(v), self.parse_search_results, 
            endpoint='execute',
            args=
                {'lua_source': script,
                'wait': 5,
                'http_method': 'GET'},
            headers={'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"})  

def parse_search_results( response):

    # print response.body
    instructors_urls = response.xpath('//*[@id="grid"]').extract()
    print instructors_urls

Ostapp avatar Apr 22 '19 23:04 Ostapp

Can you still reproduce this issue?

Gallaecio avatar Nov 21 '19 16:11 Gallaecio

I'm having the same issue. what should I do?

BravoNatalie avatar May 17 '20 00:05 BravoNatalie