scrapyrt
scrapyrt copied to clipboard
Saving scraped items in a feed
Hi! thanks for your work on Scrapyrt!
I've discovered that spiders served by Scrapyrt don't save the output in the Spider's / custom_settings / FEEDS. Is it possible to change this behavior and make the spider served by Scrapyrt respect this setting?
Thanks!
@runa can you add some sample code to reproduce this and add more details? I tested with this simple spider
import scrapy
class ToScrapeCSSSpider(scrapy.Spider):
name = "toscrape-css"
start_urls = [
'http://quotes.toscrape.com/',
]
custom_settings = {
'FEEDS': {
'items.json': {
'format': 'json'
}
}
}
def parse(self, response):
for quote in response.css("div.quote"):
yield {
'text': quote.css("span.text::text").extract_first(),
'author': quote.css("small.author::text").extract_first(),
'tags': quote.css("div.tags > a.tag::text").extract()
}
next_page_url = response.css("li.next > a::attr(href)").extract_first()
if next_page_url is not None:
yield scrapy.Request(response.urljoin(next_page_url))
and when scheduled with ScrapyRT
curl --location 'http://localhost:9080/crawl.json' \
--header 'Content-Type: application/json' \
--data '{
"request": {
"url": "https://quotes.toscrape.com/"
},
"spider_name": "toscrape-css"
}'
I see there is items.json file generated in filesystem of spider project. Is there some specific feed that is failing for you?