scrapyscript icon indicating copy to clipboard operation
scrapyscript copied to clipboard

AttributeError: Can't get attribute 'PythonSpider' on <module '__main__' (built-in)>

Open tmancini opened this issue 3 years ago • 1 comments

Hey all, this is exactly what I was looking for, but running into a few problems trying to test it out on Windows. Using the following I get the error above:

import scrapy
from scrapyscript import Job, Processor

processor = Processor(settings=None)


class PythonSpider(scrapy.spiders.Spider):
    name = "myspider"

    def start_requests(self):
        yield scrapy.Request(self.url)

    def parse(self, response):
        data = response.xpath("//title/text()").extract_first()
        return {'title': data}


job = Job(PythonSpider, url="http://www.python.org")
results = processor.run(job)

print(results)

When I move the Spider into a separate file and import that in, it seems to run without an error, but the results print as an empty array.

import scrapy
from scrapyscript import Job, Processor

from PythonSpider import PythonSpider

settings = scrapy.settings.Settings(values={'LOG_LEVEL': 'WARNING'})
processor = Processor(settings=settings)


job = Job(PythonSpider, url="http://www.python.org")
results = processor.run(job)

print(results)

tmancini avatar Sep 12 '21 13:09 tmancini

It seems that _item_scraped is not triggered, so dispatcher in Processor.__init__() doesn't work. (???)

The temporary solution is moving dispatcher.disconnect(self._item_scraped, signals.item_scraped) from __init__ to crawl in Processor class. Then comment p.terminate() line in run due to some billiard library (win32) issues.

In general, it seems to be something wrong with this library on windows :(

bsekiewicz avatar Sep 20 '21 19:09 bsekiewicz