不知道是不是python版本的问题,目前用的是3.11版本
截图上传不了,直接放报错代码
D:\爬虫\微博\weibo-search\weibo\spiders>scrapy crawl search
2023-05-12 07:45:54 [scrapy.core.scraper] ERROR: Spider error processing <GET https://s.weibo.com/weibo?q=%E5%91%A8%E6%9D%B0%E4%BC%A6%20%E6%BC%94%E5%94%B1&typeall=1&suball=1×cope=custom:2023-04-28-0:2023-05-11-0> (referer: None)
Traceback (most recent call last):
File "E:\Program Files\python11\Lib\site-packages\scrapy\utils\defer.py", line 257, in iter_errback
yield next(it)
^^^^^^^^
File "E:\Program Files\python11\Lib\site-packages\scrapy\utils\python.py", line 312, in next
return next(self.data)
^^^^^^^^^^^^^^^
File "E:\Program Files\python11\Lib\site-packages\scrapy\utils\python.py", line 312, in next
return next(self.data)
^^^^^^^^^^^^^^^
File "E:\Program Files\python11\Lib\site-packages\scrapy\core\spidermw.py", line 104, in process_sync
for r in iterable:
File "E:\Program Files\python11\Lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 28, in
return (r for r in result or () if self._filter(r, spider))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Program Files\python11\Lib\site-packages\scrapy\core\spidermw.py", line 104, in process_sync
for r in iterable:
File "E:\Program Files\python11\Lib\site-packages\scrapy\spidermiddlewares\referer.py", line 353, in
return (self._set_referer(r, response) for r in result or ())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Program Files\python11\Lib\site-packages\scrapy\core\spidermw.py", line 104, in process_sync
for r in iterable:
File "E:\Program Files\python11\Lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 27, in
return (r for r in result or () if self._filter(r, spider))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Program Files\python11\Lib\site-packages\scrapy\core\spidermw.py", line 104, in process_sync
for r in iterable:
File "E:\Program Files\python11\Lib\site-packages\scrapy\spidermiddlewares\depth.py", line 31, in
return (r for r in result or () if self.filter(r, response, spider))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Program Files\python11\Lib\site-packages\scrapy\core\spidermw.py", line 104, in process_sync
for r in iterable:
File "D:\爬虫\微博\weibo-search\weibo\spiders\search.py", line 106, in parse
for weibo in self.parse_weibo(response):
File "D:\爬虫\微博\weibo-search\weibo\spiders\search.py", line 419, in parse_weibo
comments_count = re.findall(r'\d+.*', comments_count)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Program Files\python11\Lib\re_init.py", line 216, in findall
return _compile(pattern, flags).findall(string)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or bytes-like object, got 'NoneType'
这个貌似有的用户会出错,有的没错。您可以使用较老的版本看看是否能运行。
找到原因了,代码解析有问题导致的报错,把这2个字段禁用后,就可以抓取数据了

尝试直接禁用后发现不仅抓不到评论数和点赞数,并且之后的字段都会错位,往前顺移了2列,字段名和值对应不上。
于是把红色代码改成了绿色块,现在可以正常抓取了
