weibo-search icon indicating copy to clipboard operation
weibo-search copied to clipboard

会爬取到设定时间之外的微博,辛苦作者解答!

Open limingyang325 opened this issue 1 year ago • 3 comments

设定开始日期和结束日期都是7.17的话,会爬到7.18,7.17的微博,看了一下好像大家出现这样问题的情况不多,想请问一下是什么原因呢?

limingyang325 avatar Jul 31 '24 14:07 limingyang325

可能微博接口就是如此输出的。

dataabc avatar Jul 31 '24 16:07 dataabc

好嘞 谢谢作者!还想请问一下运行一段时间后出现这个报错是什么原因哇,没有找到类似的问题! 2024-08-01 07:45:51 [scrapy.core.scraper] ERROR: Spider error processing <GET https://s.weibo.com/weibo?q=%E6%9A%B4%E9%9B%A8&typeall=1&suball=1&timescope=custom:2024-07-18-6:2024-07-18-7&page=1> (referer: https://s.weibo.com/weibo?q=%E6%9A%B4%E9%9B%A8&typeall=1&suball=1&timescope=custom:2024-07-18-0:2024-07-19-0&page=1) urllib3.exceptions.SSLError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1006)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\adapters.py", line 486, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\urllib3\connectionpool.py", line 845, in urlopen retries = retries.increment( ^^^^^^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\urllib3\util\retry.py", line 515, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='weibo.com', port=443): Max retries exceeded with url: /ajax/statuses/show?id=Oo4KYzuaR&locale=zh-CN (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1006)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\utils\defer.py", line 279, in iter_errback yield next(it) ^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\utils\python.py", line 350, in next return next(self.data) ^^^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\utils\python.py", line 350, in next return next(self.data) ^^^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\spidermiddlewares\referer.py", line 352, in return (self._set_referer(r, response) for r in result or ()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 27, in return (r for r in result or () if self._filter(r, spider)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\spidermiddlewares\depth.py", line 31, in return (r for r in result or () if self._filter(r, response, spider)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync for r in iterable: File "E:\flood\weibo-search-master-修改\weibo-search-master - 副本\weibo\spiders\search.py", line 197, in parse_by_hour for weibo in self.parse_weibo(response): File "E:\flood\weibo-search-master-修改\weibo-search-master - 副本\weibo\spiders\search.py", line 517, in parse_weibo weibo["ip"] = self.get_ip(bid) ^^^^^^^^^^^^^^^^ File "E:\flood\weibo-search-master-修改\weibo-search-master - 副本\weibo\spiders\search.py", line 271, in get_ip response = requests.get(url, headers=self.settings.get('DEFAULT_REQUEST_HEADERS')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\api.py", line 73, in get return request("get", url, params=params, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\api.py", line 59, in request return session.request(method=method, url=url, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\sessions.py", line 703, in send r = adapter.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\adapters.py", line 517, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='weibo.com', port=443): Max retries exceeded with url: /ajax/statuses/show?id=Oo4KYzuaR&locale=zh-CN (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1006)')))

可能微博接口就是如此输出的。

limingyang325 avatar Aug 01 '24 04:08 limingyang325