lightnovel-crawler
lightnovel-crawler copied to clipboard
Fix this source novelgate.net
Let us know
Novel URL: https://novelgate.net/cultivation-chat-group7-126/ App Location: PIP App Version: 3.2.9
Describe this issue
🔊 LOG LEVEL: DEBUG
14:32:49 [DEBUG] (lncrawl.core) Arguments: Namespace(log=3, log_file=None, list_sources=False, crawler=[], novel_page=None, query=None, login=None, output_formats=[], add_source_url=False, single=False, multi=False, output_path=None, filename=None, filename_only=False, force=False, ignore=False, all=False, first=None, last=None, page=None, range=None, volumes=None, chapters=None, proxy_file=None, auto_proxy=False, bot=None, shard_id=0, shard_count=1, selenium_grid=None, suppress=False, close_directly=False, extra={}) 14:32:49 [DEBUG] (lncrawl.core.sources) Loading current index data from /home/me/.lncrawl/sources/_index.json 14:32:49 [DEBUG] (lncrawl.core.sources) Current index was already downloaded once 14:32:49 [DEBUG] (lncrawl.core.sources) Saving current index data to /home/me/.lncrawl/sources/_index.json 14:32:49 [DEBUG] (lncrawl.core.sources) Saving current index data to /home/me/.lncrawl/sources/_index.json
➡ Press Ctrl + C to exit
14:32:49 [INFO] (lncrawl.core.app) Initialized App 14:32:49 [DEBUG] (asyncio) Using selector: EpollSelector ? Enter novel page url or query novel: https://novelgate.net/cultivation-chat-group7-126/ 14:32:57 [INFO] (lncrawl.bots.console.integration) Detected URL input 14:32:57 [INFO] (lncrawl.core.sources) Initializing crawler for: https://novelgate.net/ [/home/me/novels/nov/lib/python3.10/site-packages/sources/en/n/novelgate.py] Retrieving novel info... 14:32:57 [DEBUG] (166c67c64cfd1137f2cf8bc7361ab530) Visiting https://novelgate.net/cultivation-chat-group7-126/ 14:32:57 [DEBUG] (lncrawl.core.scraper) [GET] https://novelgate.net/cultivation-chat-group7-126/ timeout=(7, 301), allow_redirects=True, proxies={}, headers={b'Origin': b'https://novelgate.net', b'Referer': b'https://novelgate.net/', b'User-Agent': b'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'} 14:32:57 [DEBUG] (urllib3.connectionpool) Starting new HTTPS connection (1): novelgate.net:443 14:32:58 [DEBUG] (urllib3.connectionpool) https://novelgate.net:443 "GET /cultivation-chat-group7-126/ HTTP/1.1" 200 None 14:32:58 [INFO] (166c67c64cfd1137f2cf8bc7361ab530) Novel title: Cultivation Chat Group 14:32:58 [INFO] (166c67c64cfd1137f2cf8bc7361ab530) Novel author: Legend of the Paladin 14:32:58 [INFO] (166c67c64cfd1137f2cf8bc7361ab530) Novel cover: https://novelgate.net/images/post/cultivation-chat-group-126.jpg 14:32:58 [DEBUG] (166c67c64cfd1137f2cf8bc7361ab530) 0 chapters and 0 volumes found
❗ Error: No chapters found <class 'Exception'> File "/home/me/novels/nov/lib/python3.10/site-packages/lncrawl/bots/console/integration.py", line 107, in start raise e File "/home/me/novels/nov/lib/python3.10/site-packages/lncrawl/bots/console/integration.py", line 101, in start _download_novel() File "/home/me/novels/nov/lib/python3.10/site-packages/lncrawl/bots/console/integration.py", line 85, in _download_novel self.app.get_novel_info() File "/home/me/novels/nov/lib/python3.10/site-packages/lncrawl/core/app.py", line 137, in get_novel_info raise Exception("No chapters found")
14:32:58 [INFO] (lncrawl.core.app) App destroyed
There is a problem with the CSS selectors that the crawler is using. The CSS selectors in novelgate.py
need to be changed.
It seems the problem is arising because there is a button you have to press to show all chapters but by default, it is only showing the latest chapters. The client has to send an additional POST
request to get all chapters. You can view this when you go into the Network
tab in developer tools and click on Show all chapters
on the website. Then in the filter bar click on Fetch/XHR
and you will see the novelgate.net
request.
Please close this issue. @dipu-bd @EliezerYudkowsky