lightnovel-crawler
lightnovel-crawler copied to clipboard
Fix this source: RANOBES.TOP
Novel URL: https://ranobes.top/novels/1118135-the-first-hunter-v71134.html App Location: PIP App Version: v3.3.0
Describe this issue
I am encountering an issue when trying to scrape a novel from the provided URL. The crawler fails to find the novel's title on the page, resulting in an AssertionError.
Additionally, the crawler fails to initiate a browser session using the ChromeDriver. The error message suggests that the ChromeDriver version is not compatible with my installed version of Google Chrome.
Here are the complete logs:
01:28:17 [DEBUG] (lncrawl.core)
Arguments: Namespace(log=3, log_file=None, list_sources=False, crawler=[], novel_page=None, query=None, login=None, output_formats=[], add_source_url=False, single=False, multi=False, output_path=None, filename=None, filename_only=False, force=False, ignore=False, all=False, first=None, last=None, page=None, range=None, volumes=None, chapters=None, proxy_file=None, auto_proxy=False, bot=None, shard_id=0, shard_count=1, selenium_grid=None, suppress=False, close_directly=False, extra={})
01:28:17 [DEBUG] (lncrawl.core.sources)
Loading current index data from C:\Users\Administrator\.lncrawl\sources\_index.json
01:28:17 [DEBUG] (lncrawl.core.sources)
Current index was already downloaded once
01:28:17 [DEBUG] (lncrawl.core.sources)
Saving current index data to C:\Users\Administrator\.lncrawl\sources\_index.json
01:28:17 [DEBUG] (lncrawl.core.sources)
Saving current index data to C:\Users\Administrator\.lncrawl\sources\_index.json
01:28:18 [WARNING] (lncrawl.core.sources)
Module load failed: C:\Users\Administrator\.lncrawl\sources\en\n\novelww.py | No module named 'lncrawl.utils.cleaner'
01:28:18 [INFO] (lncrawl.core.app)
Initialized App
01:28:18 [DEBUG] (asyncio)
Using proactor: IocpProactor
? Enter novel page url or query novel: https://ranobes.top/novels/1118135-the-first-hunter-v71134.html
01:28:23 [INFO] (lncrawl.bots.console.integration)
Detected URL input
01:28:23 [INFO] (lncrawl.core.sources)
Initializing crawler for: https://ranobes.top/ [C:\Users\Administrator\.lncrawl\sources\en\r\ranobes.py]
Retrieving novel info...
01:28:23 [DEBUG] (lncrawl.core.scraper)
[GET] https://ranobes.top/novels/1118135-the-first-hunter-v71134.html
timeout=(7, 301), allow_redirects=True, proxies={}, headers={b'Origin': b'https://ranobes.top', b'Referer': b'https://ranobes.top/', b'User-Agent': b'Mozilla/5.0 (Macintosh; Intel Mac OS X 12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/105.0.1343.53'}
01:28:23 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): ranobes.top:443
01:28:26 [DEBUG] (urllib3.connectionpool)
https://ranobes.top:443 "GET /novels/1118135-the-first-hunter-v71134.html HTTP/1.1" 200 None
01:28:29 [ERROR] (lncrawl.templates.browser.basic)
Failed in read novel info:
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\general.py", line 21, in read_novel_info_in_scraper
self.novel_title = self.parse_title(soup)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\.lncrawl\sources\en\r\ranobes.py", line 61, in parse_title
assert tag
AssertionError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\basic.py", line 88, in read_novel_info
self.read_novel_info_in_scraper()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\general.py", line 23, in read_novel_info_in_scraper
raise FallbackToBrowser() from e
lncrawl.core.exeptions.FallbackToBrowser
01:28:29 [INFO] (WDM)
====== WebDriver manager ======
01:28:31 [INFO] (WDM)
There is no [win32] chromedriver "latest" for browser google-chrome "117.0.5938" in cache
01:28:31 [INFO] (WDM)
Get LATEST chromedriver version for google-chrome
01:28:31 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): chromedriver.storage.googleapis.com:443
01:28:31 [DEBUG] (urllib3.connectionpool)
https://chromedriver.storage.googleapis.com:443 "GET /LATEST_RELEASE_117.0.5938 HTTP/1.1" 404 200
Exception in thread Thread-1 (read_novel_info):
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\general.py", line 21, in read_novel_info_in_scraper
self.novel_title = self.parse_title(soup)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\.lncrawl\sources\en\r\ranobes.py", line 61, in parse_title
assert tag
AssertionError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\basic.py", line 88, in read_novel_info
self.read_novel_info_in_scraper()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\general.py", line 23, in read_novel_info_in_scraper
raise FallbackToBrowser() from e
lncrawl.core.exeptions.FallbackToBrowser
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1038, in _bootstrap_inner
self.run()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\basic.py", line 95, in read_novel_info
self.read_novel_info_in_browser()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\general.py", line 47, in read_novel_info_in_browser
self.visit_novel_page_in_browser()
File "C:\Users\Administrator\.lncrawl\sources\en\r\ranobes.py", line 55, in visit_novel_page_in_browser
self.visit(self.novel_url)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\basic.py", line 65, in visit
self._visit(url)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\core\browser.py", line 155, in visit
self._init_browser()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\core\browser.py", line 62, in _init_browser
self._driver = create_new(
^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\webdriver\__init__.py", line 35, in create_new
return create_local(
^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\webdriver\local.py", line 109, in create_local
executable_path = _acquire_chrome_driver_path()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\webdriver\local.py", line 28, in _acquire_chrome_driver_path
return ChromeDriverManager().install()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\chrome.py", line 39, in install
driver_path = self._get_driver_path(self.driver)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\core\manager.py", line 30, in _get_driver_path
file = self._download_manager.download_file(driver.get_driver_download_url())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\drivers\chrome.py", line 40, in get_driver_download_url
driver_version_to_download = self.get_driver_version_to_download()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\core\driver.py", line 51, in get_driver_version_to_download
self._driver_to_download_version = self._version if self._version not in (None, "latest") else self.get_latest_release_version()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\drivers\chrome.py", line 62, in get_latest_release_version
resp = self._http_client.get(url=latest_release_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\core\http.py", line 37, in get
self.validate_response(resp)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\core\http.py", line 16, in validate_response
raise ValueError(f"There is no such driver by url {resp.url}")
ValueError: There is no such driver by url https://chromedriver.storage.googleapis.com/LATEST_RELEASE_117.0.5938
! Error: No chapters found
<class 'Exception'>
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\bots\console\integration.py", line 107, in start
raise e
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\bots\console\integration.py", line 101, in start
_download_novel()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\bots\console\integration.py", line 85, in _download_novel
self.app.get_novel_info()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\core\app.py", line 137, in get_novel_info
raise Exception("No chapters found")
01:28:31 [INFO] (lncrawl.core.app)
App destroyed
I would appreciate any help in resolving these issues. Thank you.