ComicCrawler
ComicCrawler copied to clipboard
8comic html is empty
今天下載了最新版的ComicCrawler 但下載8comic漫畫還是有錯
漫畫網址: https://www.8comic.com/html/13736.html
錯誤: Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 10 seconds...
是哪一話?
https://www.8comic.com/html/13736.html 看起來第0話就下載失敗了 我打開網址點開第0話 網址會是 https://articles.onemoreplace.tw/online/new-13736.html?ch=0
total 305 episode. Downloading ep 00話 Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 10 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 20 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 40 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 80 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' Something bad happened, skip the episode. Downloading ep 01話 Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 10 seconds...
我這裡測試正常。檢查一下第0話的原始碼,有沒有這段︰
我這裡測試正常。檢查一下第0話的原始碼,有沒有這段︰
我用chrome 無痕視窗 實驗
如果由
https://www.8comic.com/html/13736.html
點0話打開
https://articles.onemoreplace.tw/online/new-13736.html?ch=0
會正常開啟0話有漫畫圖片
檢視網頁原始碼, 是會有這段原始碼的
但如果我複製網址
https://articles.onemoreplace.tw/online/new-13736.html?ch=0
直接無痕開新分頁
貼上此網址
打開畫面會是
而不是正常漫畫畫面
此時檢視網頁原始碼, 就不會有這段了
我猜測comiccrawler抓到的html就是第2個不是漫畫頁面的狀況
試試開啟 errorlog︰
- 在 setting.ini 裡,設定
errorlog = true - 開啟 comiccrawler,開始下載
- 看到錯誤後,關閉 comiccrawler
- 網路請求的結果會寫進 setting.ini 旁的 grabber.log
如果可以編輯程式碼,可以找到 eight.py 做以下修改︰
diff --git a/comiccrawler/mods/eight.py b/comiccrawler/mods/eight.py
index 815e10a..ffc57ef 100644
--- a/comiccrawler/mods/eight.py
+++ b/comiccrawler/mods/eight.py
@@ -71,6 +71,9 @@ j_js = ""
lazy_js = ""
def get_images(html, url):
+ import pathlib
+ pathlib.Path("8comic.html").write_text(html, encoding="utf-8")
+
global j_js
if not j_js:
j_js = re.search(r'src="([^"]*/j\.js[^"]*)"', html).group(1)
這樣在發生錯誤時,就會把HTML原始碼寫進 8comic.html
pathlib.Path("8comic.html").write_text(html, encoding="utf-8")
8comic.html
是空的
我就不附檔案,改附圖了
我有另外印 crawler.py裡get_html和get_images裡拿到的資訊 crawler.py呼叫完self.downloader.html self.html還是空的 所以eight.py裡的get_images html也是空的 這和我直接把https://articles.onemoreplace.tw/online/new-13736.html?ch=0 貼到瀏覽器無痕視窗看到的不一樣
Start downloading 炎炎之消防隊-無限-8comic total 305 episode. Downloading ep 00話 [crawler.py][get_html]self.ep.current_url https://articles.onemoreplace.tw/online/new-13736.html?ch=0 [crawler.py][get_html]self.mission.url https://8comic.com/html/13736.html [crawler.py][get_html]self.html
[crawler.py][get_images]self.html
[eight.py][get_images]html
