spiderman
spiderman copied to clipboard
基于 scrapy-redis 的通用分布式爬虫框架
Bumps [scrapy](https://github.com/scrapy/scrapy) from 2.6.0 to 2.6.2. Release notes Sourced from scrapy's releases. 2.6.2 Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0....
Bumps [lxml](https://github.com/lxml/lxml) from 4.6.5 to 4.9.1. Changelog Sourced from lxml's changelog. 4.9.1 (2022-07-01) Bugs fixed A crash was resolved when using iterwalk() (or canonicalize()) after parsing certain incorrect input. Note...
Bumps [numpy](https://github.com/numpy/numpy) from 1.21.0 to 1.22.0. Release notes Sourced from numpy's releases. v1.22.0 NumPy 1.22.0 Release Notes NumPy 1.22.0 is a big release featuring the work of 153 contributors spread...
def __init__(self): super().__init__(spider_name=zhifang_Spider.name) self.delete() # 如需去重、增量采集,请注释该行 self.headers = { # 有反爬的话,可以在这边定制请求头 } self.cookies = ( # 多账号采集的话,可以在这边定制多个 cookie string )
2023-05-07 23:35:36 [spiderman.model.standalone] ERROR: 爬虫执行失败:2023-05-07 23:35:36 [scrapy.utils.log] INFO: Scrapy 2.6.2 started (bot: SP) 2023-05-07 23:35:36 [scrapy.utils.log] INFO: Versions: lxml 4.9.1.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.21.0, Twisted 22.10.0,...
Bumps [redis](https://github.com/redis/redis-py) from 3.5.0 to 4.4.4. Release notes Sourced from redis's releases. 4.4.4 Changes Upgrade urgency: SECURITY, contains fixes to security issues. (CVE-2023-28859) - Cancelling an async future does not,...
Bumps [starlette](https://github.com/encode/starlette) from 0.14.2 to 0.25.0. Release notes Sourced from starlette's releases. Version 0.25.0 Fixed Limit the number of fields and files when parsing multipart/form-data on the MultipartParser 8c74c2c and...
运行时make_request_from_data并没有被调用,scrapy没有被启动是怎么回事,但redis中是有内容的。
1 启动时 [py.warnings] WARNING: /home/donney/.local/lib/python3.10/site-packages/scrapy/utils/request.py:232: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. It is also the default value. In other words, it is normal to get this...
data:image/s3,"s3://crabby-images/a3371/a3371c5be898e6d7fc226d6ca01fb64a5d351398" alt="image" 看到框架中 关于下一次请求是丢回到 redis中的 问题: 1.是否可以直接用 reture 和 yield 返回 2.如果1可以的话 跟丢回 redis有什么区别