scrapy-deltafetch
scrapy-deltafetch copied to clipboard
Facing issue regarding deltafetch
Kinldy help me on below issue:
I tried to crawl the data using “DeltaFetch”, but facing below issue:
My DB file is getting updated both time when i am using below command to run the Crawler “$ scrapy crawl quotes -a deltafetch_reset=1” “$ scrapy crawl quotes -a deltafetch_reset=0”
My DB file is not getting updated when i am using below command: “$ scrapy crawl quotes”
Below are the updation i have done in setting.py file: SPIDER_MIDDLEWARES = { ‘scrapy.contrib.spidermiddleware.referer.RefererMiddleware’: True, ‘scrapy_deltafetch.DeltaFetch’: 100, }
COOKIES_ENABLED = True COOKIES_DEBUG = True DELTAFETCH_ENABLED = True DELTAFETCH_DIR = ‘/home/administrator/apps/scrapy-deltafetch/Crawling/Crawling/crawl_output’ DOTSCRAPY_ENABLED = True
please find my below code:
import scrapy from selenium import webdriver from w3lib.url import url_query_parameter
class QuotesSpider(scrapy.Spider): name = “quotes_git”
def start_requests(self): urls = [ ‘https://www.wikipedia.org/’, ] for url in urls: yield scrapy.Request(url=url, meta={‘deltafetch_key’: url_query_parameter(url, ‘abc001’)}, callback=self.parse) def parse(self, response): print (‘testing’) print(response.url) self.driver = webdriver.Chrome(‘/home/administrator/Downloads/Gopal/Crawling/Crawling/spiders/chromedriver’)
self.driver.get(response.url) print(‘check point1’)
title = self.driver.title print (title)
filename = ‘sample_git.txt’ with open(filename, ‘wb’) as f: f.write(response.url + title) print (‘done’)