scrapyscript
scrapyscript copied to clipboard
AttributeError: Can't get attribute 'PythonSpider' on <module '__main__' (built-in)>
Hey all, this is exactly what I was looking for, but running into a few problems trying to test it out on Windows. Using the following I get the error above:
import scrapy
from scrapyscript import Job, Processor
processor = Processor(settings=None)
class PythonSpider(scrapy.spiders.Spider):
name = "myspider"
def start_requests(self):
yield scrapy.Request(self.url)
def parse(self, response):
data = response.xpath("//title/text()").extract_first()
return {'title': data}
job = Job(PythonSpider, url="http://www.python.org")
results = processor.run(job)
print(results)
When I move the Spider into a separate file and import that in, it seems to run without an error, but the results print as an empty array.
import scrapy
from scrapyscript import Job, Processor
from PythonSpider import PythonSpider
settings = scrapy.settings.Settings(values={'LOG_LEVEL': 'WARNING'})
processor = Processor(settings=settings)
job = Job(PythonSpider, url="http://www.python.org")
results = processor.run(job)
print(results)
It seems that _item_scraped
is not triggered, so dispatcher
in Processor.__init__()
doesn't work. (???)
The temporary solution is moving dispatcher.disconnect(self._item_scraped, signals.item_scraped)
from __init__
to crawl
in Processor
class. Then comment p.terminate()
line in run
due to some billiard
library (win32) issues.
In general, it seems to be something wrong with this library on windows :(