rickchen16

Results 5 comments of rickchen16

沒試出requests避開Cloudflare的方法 使用BeautifulSoup+selenium可以讓python訪問cocomanga 取得畫面目錄頁面的html, ex:https://www.cocomanhua.com/15335/ 但因為html有變, oh.py的get_episodes要改一下 但get_images還是會在imgs = eval(code)出錯 即使成功組出image的url, 直接access還是會被擋掉, 還不知道怎樣才能直接access圖片url ex: https://img.cocomanga.com/comic/15335/**RnpUVS9ucGdsRlhkN3ZXbklDeWhBbE5kVERsaFo5TUJ5M3JQdFhXTVQxMD0**=/0001.jpg https://img.cocomanga.com/comic/15335/RnpUVS9ucGdsRlhkN3ZXbklDeWhBbE5kVERsaFo5TUJ5M3JQdFhXTVQxMD0=/0003.jpg https://img.cocomanga.com/comic/15335/RnpUVS9ucGdsRlhkN3ZXbklDeWhBbE5kVERsaFo5TUJ5M3JQdFhXTVQxMD0=/0005.jpg RnpUVS9ucGdsRlhkN3ZXbklDeWhBbE5kVERsaFo5TUJ5M3JQdFhXTVQxMD0 這看起來是動態產生的 # 取得"https://www.cocomanhua.com/15335/" html content from bs4 import BeautifulSoup from selenium import webdriver...

domain還少了 8comic.com 所以更新了eight.py還是有錯 comiccrawler.error.ModuleError: Get module failed: https://8comic.com/html/11011.html

https://www.8comic.com/html/13736.html 看起來第0話就下載失敗了 我打開網址點開第0話 網址會是 https://articles.onemoreplace.tw/online/new-13736.html?ch=0 total 305 episode. Downloading ep 00話 Traceback (most recent call last): File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop process() File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download crawler.init()...

> 我這裡測試正常。檢查一下第0話的原始碼,有沒有這段︰ ![圖片](https://private-user-images.githubusercontent.com/1324510/393044987-cf02e204-6a89-4547-90f6-b891692a2dd3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzM1NDc4MzAsIm5iZiI6MTczMzU0NzUzMCwicGF0aCI6Ii8xMzI0NTEwLzM5MzA0NDk4Ny1jZjAyZTIwNC02YTg5LTQ1NDctOTBmNi1iODkxNjkyYTJkZDMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MTIwNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDEyMDdUMDQ1ODUwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NzBlOGJkNzhhYjZjZGY2YzExNzIyYWI3NWZjN2U5OWRlODIwMzM3Mjg0MmI2ZDgwZTQ2YWNhOWRkOTQ1ODMyMiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.t35_eUEaGUhLgJDbbFU0Hf2sJ2h8bvzw-mRlw607ikU) 我用chrome 無痕視窗 實驗 如果由 https://www.8comic.com/html/13736.html 點0話打開 https://articles.onemoreplace.tw/online/new-13736.html?ch=0 會正常開啟0話有漫畫圖片 ![image](https://github.com/user-attachments/assets/24c586e5-df73-418c-be55-18a5a4562094) 檢視網頁原始碼, 是會有這段原始碼的 ![image](https://github.com/user-attachments/assets/a56c122a-3eeb-4d0a-92c9-d3850d6a6c03) 但如果我複製網址 https://articles.onemoreplace.tw/online/new-13736.html?ch=0 直接無痕開新分頁 貼上此網址 打開畫面會是 ![image](https://github.com/user-attachments/assets/acc66aff-caf2-48c3-a1a0-14ffb6bbb141) 而不是正常漫畫畫面 此時檢視網頁原始碼, 就不會有這段了 我猜測comiccrawler抓到的html就是第2個不是漫畫頁面的狀況

> ```diff > pathlib.Path("8comic.html").write_text(html, encoding="utf-8") > ``` [grabber.log](https://github.com/user-attachments/files/18133940/grabber.log) 8comic.html 是空的 我就不附檔案,改附圖了 ![image](https://github.com/user-attachments/assets/c2969440-7a82-44a3-9410-1a7978b6581d) 我有另外印 crawler.py裡get_html和get_images裡拿到的資訊 crawler.py呼叫完self.downloader.html self.html還是空的 所以eight.py裡的get_images html也是空的 這和我直接把https://articles.onemoreplace.tw/online/new-13736.html?ch=0 貼到瀏覽器無痕視窗看到的不一樣 Start downloading 炎炎之消防隊-無限-8comic total 305 episode. Downloading ep 00話 [crawler.py][get_html]self.ep.current_url...