lightnovel-crawler icon indicating copy to clipboard operation
lightnovel-crawler copied to clipboard

Fix this source novelgate.net

Open EliezerYudkowsky opened this issue 1 year ago • 3 comments

Let us know

Novel URL: https://novelgate.net/cultivation-chat-group7-126/ App Location: PIP App Version: 3.2.9

Describe this issue

                      🔊 LOG LEVEL: DEBUG

14:32:49 [DEBUG] (lncrawl.core) Arguments: Namespace(log=3, log_file=None, list_sources=False, crawler=[], novel_page=None, query=None, login=None, output_formats=[], add_source_url=False, single=False, multi=False, output_path=None, filename=None, filename_only=False, force=False, ignore=False, all=False, first=None, last=None, page=None, range=None, volumes=None, chapters=None, proxy_file=None, auto_proxy=False, bot=None, shard_id=0, shard_count=1, selenium_grid=None, suppress=False, close_directly=False, extra={}) 14:32:49 [DEBUG] (lncrawl.core.sources) Loading current index data from /home/me/.lncrawl/sources/_index.json 14:32:49 [DEBUG] (lncrawl.core.sources) Current index was already downloaded once 14:32:49 [DEBUG] (lncrawl.core.sources) Saving current index data to /home/me/.lncrawl/sources/_index.json 14:32:49 [DEBUG] (lncrawl.core.sources) Saving current index data to /home/me/.lncrawl/sources/_index.json

➡ Press Ctrl + C to exit

14:32:49 [INFO] (lncrawl.core.app) Initialized App 14:32:49 [DEBUG] (asyncio) Using selector: EpollSelector ? Enter novel page url or query novel: https://novelgate.net/cultivation-chat-group7-126/ 14:32:57 [INFO] (lncrawl.bots.console.integration) Detected URL input 14:32:57 [INFO] (lncrawl.core.sources) Initializing crawler for: https://novelgate.net/ [/home/me/novels/nov/lib/python3.10/site-packages/sources/en/n/novelgate.py] Retrieving novel info... 14:32:57 [DEBUG] (166c67c64cfd1137f2cf8bc7361ab530) Visiting https://novelgate.net/cultivation-chat-group7-126/ 14:32:57 [DEBUG] (lncrawl.core.scraper) [GET] https://novelgate.net/cultivation-chat-group7-126/ timeout=(7, 301), allow_redirects=True, proxies={}, headers={b'Origin': b'https://novelgate.net', b'Referer': b'https://novelgate.net/', b'User-Agent': b'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'} 14:32:57 [DEBUG] (urllib3.connectionpool) Starting new HTTPS connection (1): novelgate.net:443 14:32:58 [DEBUG] (urllib3.connectionpool) https://novelgate.net:443 "GET /cultivation-chat-group7-126/ HTTP/1.1" 200 None 14:32:58 [INFO] (166c67c64cfd1137f2cf8bc7361ab530) Novel title: Cultivation Chat Group 14:32:58 [INFO] (166c67c64cfd1137f2cf8bc7361ab530) Novel author: Legend of the Paladin 14:32:58 [INFO] (166c67c64cfd1137f2cf8bc7361ab530) Novel cover: https://novelgate.net/images/post/cultivation-chat-group-126.jpg 14:32:58 [DEBUG] (166c67c64cfd1137f2cf8bc7361ab530) 0 chapters and 0 volumes found

❗ Error: No chapters found <class 'Exception'> File "/home/me/novels/nov/lib/python3.10/site-packages/lncrawl/bots/console/integration.py", line 107, in start raise e File "/home/me/novels/nov/lib/python3.10/site-packages/lncrawl/bots/console/integration.py", line 101, in start _download_novel() File "/home/me/novels/nov/lib/python3.10/site-packages/lncrawl/bots/console/integration.py", line 85, in _download_novel self.app.get_novel_info() File "/home/me/novels/nov/lib/python3.10/site-packages/lncrawl/core/app.py", line 137, in get_novel_info raise Exception("No chapters found")

14:32:58 [INFO] (lncrawl.core.app) App destroyed

EliezerYudkowsky avatar Aug 22 '23 21:08 EliezerYudkowsky

There is a problem with the CSS selectors that the crawler is using. The CSS selectors in novelgate.py need to be changed.

zGadli avatar Aug 23 '23 05:08 zGadli

It seems the problem is arising because there is a button you have to press to show all chapters but by default, it is only showing the latest chapters. The client has to send an additional POST request to get all chapters. You can view this when you go into the Network tab in developer tools and click on Show all chapters on the website. Then in the filter bar click on Fetch/XHR and you will see the novelgate.net request.

image

zGadli avatar Aug 23 '23 05:08 zGadli

Please close this issue. @dipu-bd @EliezerYudkowsky

zGadli avatar Aug 28 '23 20:08 zGadli