scrapy-deltafetch icon indicating copy to clipboard operation
scrapy-deltafetch copied to clipboard

Facing issue regarding deltafetch

Open gopal1414 opened this issue 6 years ago • 0 comments

Kinldy help me on below issue:

I tried to crawl the data using “DeltaFetch”, but facing below issue:

My DB file is getting updated both time when i am using below command to run the Crawler “$ scrapy crawl quotes -a deltafetch_reset=1” “$ scrapy crawl quotes -a deltafetch_reset=0”

My DB file is not getting updated when i am using below command: “$ scrapy crawl quotes”

Below are the updation i have done in setting.py file: SPIDER_MIDDLEWARES = { ‘scrapy.contrib.spidermiddleware.referer.RefererMiddleware’: True, ‘scrapy_deltafetch.DeltaFetch’: 100, }

COOKIES_ENABLED = True COOKIES_DEBUG = True DELTAFETCH_ENABLED = True DELTAFETCH_DIR = ‘/home/administrator/apps/scrapy-deltafetch/Crawling/Crawling/crawl_output’ DOTSCRAPY_ENABLED = True

please find my below code:

import scrapy from selenium import webdriver from w3lib.url import url_query_parameter

class QuotesSpider(scrapy.Spider): name = “quotes_git”

def start_requests(self): urls = [ ‘https://www.wikipedia.org/’, ] for url in urls: yield scrapy.Request(url=url, meta={‘deltafetch_key’: url_query_parameter(url, ‘abc001’)}, callback=self.parse) def parse(self, response): print (‘testing’) print(response.url) self.driver = webdriver.Chrome(‘/home/administrator/Downloads/Gopal/Crawling/Crawling/spiders/chromedriver’)

self.driver.get(response.url) print(‘check point1’)

title = self.driver.title print (title)

filename = ‘sample_git.txt’ with open(filename, ‘wb’) as f: f.write(response.url + title) print (‘done’)

gopal1414 avatar May 22 '18 12:05 gopal1414