linovelib2epub icon indicating copy to clipboard operation
linovelib2epub copied to clipboard

Crawl light novel from some websites and convert it to epub.

Results 7 linovelib2epub issues
Sort by recently updated
recently updated
newest added

目前哔哩轻小说每章最后一句话会使用字体反爬: 例:https://www.bilinovel.com/novel/3825/228977_2.html ![_TTLD6WXF1EO%GMO B)G(R7](https://github.com/lightnovel-center/linovelib2epub/assets/83676393/6ae4442b-533c-4da1-844f-f2bb91b8cba8) 段落在不使用正确字体的情况下会显示为乱码,导致最终的epub包含乱码 可能的解决方案: 1. 对最后一句话使用提供的woff渲染,然后使用tesseract OCR。 2. 根据[笔画信息](https://zhuanlan.zhihu.com/p/37838586)找到字体映射。(但如果网站隔一段时间更换字体则颇为不便)

enhancement

**Describe the bug(描述这个BUG)** 选择章节进行下载后,浏览器会疯狂刷新,然后触发风控。浏览器以及驱动已确定为最新版 `124.0.6367.91` **To Reproduce(复现步骤)** ```python from linovelib2epub import Linovelib2Epub if __name__ == '__main__': # /path/to/chromedriver browser_driver_path = r'C:\Users\Administrator\Desktop\chromedriver-win64\chromedriver.exe' linovelib_epub = Linovelib2Epub(book_id=8, chapter_crawl_delay=3, page_crawl_delay=2, select_volume_mode=True, browser_driver_path=browser_driver_path, log_level='DEBUG') linovelib_epub.run()...

bug
help wanted
拉锯战
Already Reproduced

**Describe the bug(描述这个BUG)** A clear and concise description of what the bug is. 跳出的瀏覽器正常顯示,但錯誤訊息卻顯示為空。 **To Reproduce(复现步骤)** 复现的代码以及操作(例如分支选择、卷选择等等) python 檔案 ```python from linovelib2epub import Linovelib2Epub if __name__ == "__main__": linovelib_epub =...

bug

**Describe the bug(描述这个BUG)** A clear and concise description of what the bug is. 如目前要求使用的 pillow 版本 9.2.0 並不運作於 Python 3.12,需要使用 10 以上版本。 (還有 lxml 等套件也有同樣的問題) **Expected behavior(期望的行为)** A clear and...

bug

執行時的log ``` 2024-04-04,16:12:18 INFO LinovelibMobileSpider Succeed to get the novel of book_id: 8 linovelib_mobile_spider.py:85 INFO LinovelibMobileSpider book linovelib_mobile_spider.py:95 name:《欢迎来到实力至上主义的教室》 INFO LinovelibMobileSpider Succeed to get the catalog of book_id: linovelib_mobile_spider.py:145 8...

bug
拉锯战
Already Reproduced

如题,内容明显不符,而日志看起来没有异常 ![image](https://github.com/lightnovel-center/linovelib2epub/assets/11171892/100ff952-077f-4a5b-832e-3ec7486910b1) 代码如下: ```py from linovelib2epub import Linovelib2Epub, TargetSite bookId = 251 browserPath = "C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe" if __name__ == '__main__': linovelib_epub = Linovelib2Epub(book_id=bookId, target_site=TargetSite.MASIRO, browser_path=browserPath) linovelib_epub.run() ``` 日志: ```log...

bug
拉锯战

不加rate limit容易把网站爬挂掉,一直报错502. 建议的实现:允许通过参数设置基础延迟,sleep之后再请求下一页