scrapy-playwright
scrapy-playwright copied to clipboard
inspect_response not working in spider with scrapy_playwright
Hi all,
I have a simple example below which should work but doesn't.
class AwesomeSpider(scrapy.Spider):
name = "test-playwright"
def start_requests(self):
yield scrapy.Request("https://quotes.toscrape.com/", meta={
"playwright": True,
})
def parse(self, response):
inspect_response(response, self)
for quote in response.css('div.quote'):
yield {
'text': quote.css('span.text::text').get(),
'author': quote.css('small.author::text').get(),
'tags': quote.css('div.tags a.tag::text').getall(),
}
It gives the errors in the shell:
..........
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] view(response) View response in a browser
**2022-08-15 11:17:16 [scrapy.core.scraper] ERROR: Spider error processing <GET [https://quotes.toscrape.com/>](https://quotes.toscrape.com/%3E) (referer: https://fonts.googleapis.com/)
Traceback (most recent call last):
File "/python/scrapy-projects/rightmove/venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1030, in adapt
extracted = result.result()
2022-08-15 11:17:16 [py.warnings] WARNING: /python/scrapy-projects/rightmove/venv/lib/python3.9/site-packages/IPython/core/displayhook.py:311: RuntimeWarning: coroutine 'Application.run_async' was never awaited
gc.collect()**
However, everything works fine if I run the scrapy shell initially directly from command line like so: scrapy shell 'https://quotes.toscrape.com'
Any ideas? I'm stumped, I think it's something to do with asyncio. Thanks,
(edited for syntax highlighting)
The traceback does not seem to be exactly the same, however the whole situation looks very similar to https://github.com/scrapy/scrapy/issues/5447. I see mentions of ipython
in your post, I'd recommend trying with the regular interpreter by setting the SCRAPY_PYTHON_SHELL=python
env variable.