scrapyrt icon indicating copy to clipboard operation
scrapyrt copied to clipboard

Saving scraped items in a feed

Open runa opened this issue 1 year ago • 1 comments

Hi! thanks for your work on Scrapyrt!

I've discovered that spiders served by Scrapyrt don't save the output in the Spider's / custom_settings / FEEDS. Is it possible to change this behavior and make the spider served by Scrapyrt respect this setting?

Thanks!

runa avatar Jun 02 '23 14:06 runa

@runa can you add some sample code to reproduce this and add more details? I tested with this simple spider


import scrapy


class ToScrapeCSSSpider(scrapy.Spider):
    name = "toscrape-css"
    start_urls = [
        'http://quotes.toscrape.com/',
    ]
    custom_settings = {
        'FEEDS': {
            'items.json': {
                'format': 'json'
            }
        }
    }

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                'text': quote.css("span.text::text").extract_first(),
                'author': quote.css("small.author::text").extract_first(),
                'tags': quote.css("div.tags > a.tag::text").extract()
            }

        next_page_url = response.css("li.next > a::attr(href)").extract_first()
        if next_page_url is not None:
            yield scrapy.Request(response.urljoin(next_page_url))

and when scheduled with ScrapyRT

curl --location 'http://localhost:9080/crawl.json' \
--header 'Content-Type: application/json' \
--data '{
    "request": {
        "url": "https://quotes.toscrape.com/"
    },
    "spider_name": "toscrape-css"
}'

I see there is items.json file generated in filesystem of spider project. Is there some specific feed that is failing for you?

pawelmhm avatar Feb 23 '24 07:02 pawelmhm