ComicCrawler 8comic html is empty

今天下載了最新版的ComicCrawler 但下載8comic漫畫還是有錯

漫畫網址: https://www.8comic.com/html/13736.html

錯誤: Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 10 seconds...

Nov 16 '24 14:11 rickchen16

是哪一話？

Nov 19 '24 15:11 eight04

https://www.8comic.com/html/13736.html 看起來第0話就下載失敗了我打開網址點開第0話網址會是 https://articles.onemoreplace.tw/online/new-13736.html?ch=0

total 305 episode. Downloading ep 00話 Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 10 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 20 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 40 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 80 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' Something bad happened, skip the episode. Downloading ep 01話 Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 10 seconds...

Nov 23 '24 11:11 rickchen16

我這裡測試正常。檢查一下第0話的原始碼，有沒有這段︰

Dec 06 '24 00:12 eight04

我這裡測試正常。檢查一下第0話的原始碼，有沒有這段︰

我用chrome 無痕視窗實驗如果由 https://www.8comic.com/html/13736.html 點0話打開 https://articles.onemoreplace.tw/online/new-13736.html?ch=0 會正常開啟0話有漫畫圖片檢視網頁原始碼, 是會有這段原始碼的

但如果我複製網址 https://articles.onemoreplace.tw/online/new-13736.html?ch=0 直接無痕開新分頁貼上此網址打開畫面會是而不是正常漫畫畫面此時檢視網頁原始碼, 就不會有這段了

我猜測comiccrawler抓到的html就是第2個不是漫畫頁面的狀況

Dec 07 '24 05:12 rickchen16

試試開啟 errorlog︰

在 setting.ini 裡，設定 errorlog = true
開啟 comiccrawler，開始下載
看到錯誤後，關閉 comiccrawler
網路請求的結果會寫進 setting.ini 旁的 grabber.log

如果可以編輯程式碼，可以找到 eight.py 做以下修改︰

diff --git a/comiccrawler/mods/eight.py b/comiccrawler/mods/eight.py
index 815e10a..ffc57ef 100644
--- a/comiccrawler/mods/eight.py
+++ b/comiccrawler/mods/eight.py
@@ -71,6 +71,9 @@ j_js = ""
 lazy_js = ""
 	
 def get_images(html, url):
+	import pathlib
+	pathlib.Path("8comic.html").write_text(html, encoding="utf-8")
+
 	global j_js
 	if not j_js:
 		j_js = re.search(r'src="([^"]*/j\.js[^"]*)"', html).group(1)

這樣在發生錯誤時，就會把HTML原始碼寫進 8comic.html

Dec 09 '24 07:12 eight04

pathlib.Path("8comic.html").write_text(html, encoding="utf-8")

grabber.log

8comic.html 是空的我就不附檔案，改附圖了

我有另外印 crawler.py裡get_html和get_images裡拿到的資訊 crawler.py呼叫完self.downloader.html self.html還是空的所以eight.py裡的get_images html也是空的這和我直接把https://articles.onemoreplace.tw/online/new-13736.html?ch=0 貼到瀏覽器無痕視窗看到的不一樣

Start downloading 炎炎之消防隊-無限-8comic total 305 episode. Downloading ep 00話 [crawler.py][get_html]self.ep.current_url https://articles.onemoreplace.tw/online/new-13736.html?ch=0 [crawler.py][get_html]self.mission.url https://8comic.com/html/13736.html [crawler.py][get_html]self.html

[crawler.py][get_images]self.html

[eight.py][get_images]html

Dec 14 '24 02:12 rickchen16

ComicCrawler ComicCrawler copied to clipboard

8comic html is empty

ComicCrawler
ComicCrawler copied to clipboard