ComicCrawler icon indicating copy to clipboard operation
ComicCrawler copied to clipboard

8comic html is empty

Open rickchen16 opened this issue 1 year ago • 6 comments

今天下載了最新版的ComicCrawler 但下載8comic漫畫還是有錯

漫畫網址: https://www.8comic.com/html/13736.html

錯誤: Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 10 seconds...

rickchen16 avatar Nov 16 '24 14:11 rickchen16

是哪一話?

eight04 avatar Nov 19 '24 15:11 eight04

https://www.8comic.com/html/13736.html 看起來第0話就下載失敗了 我打開網址點開第0話 網址會是 https://articles.onemoreplace.tw/online/new-13736.html?ch=0

total 305 episode. Downloading ep 00話 Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 10 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 20 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 40 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 80 seconds... Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' Something bad happened, skip the episode. Downloading ep 01話 Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init self.init_images(self.ep.current_page - 1) File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images self.get_images() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images images = self.mod.get_images( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'group' wait 10 seconds...

rickchen16 avatar Nov 23 '24 11:11 rickchen16

我這裡測試正常。檢查一下第0話的原始碼,有沒有這段︰ 圖片

eight04 avatar Dec 06 '24 00:12 eight04

我這裡測試正常。檢查一下第0話的原始碼,有沒有這段︰ 圖片

我用chrome 無痕視窗 實驗 如果由 https://www.8comic.com/html/13736.html 點0話打開 https://articles.onemoreplace.tw/online/new-13736.html?ch=0 會正常開啟0話有漫畫圖片 image 檢視網頁原始碼, 是會有這段原始碼的 image

但如果我複製網址 https://articles.onemoreplace.tw/online/new-13736.html?ch=0 直接無痕開新分頁 貼上此網址 打開畫面會是 image 而不是正常漫畫畫面 此時檢視網頁原始碼, 就不會有這段了

我猜測comiccrawler抓到的html就是第2個不是漫畫頁面的狀況

rickchen16 avatar Dec 07 '24 05:12 rickchen16

試試開啟 errorlog︰

  1. 在 setting.ini 裡,設定 errorlog = true
  2. 開啟 comiccrawler,開始下載
  3. 看到錯誤後,關閉 comiccrawler
  4. 網路請求的結果會寫進 setting.ini 旁的 grabber.log

如果可以編輯程式碼,可以找到 eight.py 做以下修改︰

diff --git a/comiccrawler/mods/eight.py b/comiccrawler/mods/eight.py
index 815e10a..ffc57ef 100644
--- a/comiccrawler/mods/eight.py
+++ b/comiccrawler/mods/eight.py
@@ -71,6 +71,9 @@ j_js = ""
 lazy_js = ""
 	
 def get_images(html, url):
+	import pathlib
+	pathlib.Path("8comic.html").write_text(html, encoding="utf-8")
+
 	global j_js
 	if not j_js:
 		j_js = re.search(r'src="([^"]*/j\.js[^"]*)"', html).group(1)

這樣在發生錯誤時,就會把HTML原始碼寫進 8comic.html

eight04 avatar Dec 09 '24 07:12 eight04

pathlib.Path("8comic.html").write_text(html, encoding="utf-8")

grabber.log

8comic.html 是空的 我就不附檔案,改附圖了 image

我有另外印 crawler.py裡get_html和get_images裡拿到的資訊 crawler.py呼叫完self.downloader.html self.html還是空的 所以eight.py裡的get_images html也是空的 這和我直接把https://articles.onemoreplace.tw/online/new-13736.html?ch=0 貼到瀏覽器無痕視窗看到的不一樣

Start downloading 炎炎之消防隊-無限-8comic total 305 episode. Downloading ep 00話 [crawler.py][get_html]self.ep.current_url https://articles.onemoreplace.tw/online/new-13736.html?ch=0 [crawler.py][get_html]self.mission.url https://8comic.com/html/13736.html [crawler.py][get_html]self.html

[crawler.py][get_images]self.html

[eight.py][get_images]html

rickchen16 avatar Dec 14 '24 02:12 rickchen16