lightnovel-crawler icon indicating copy to clipboard operation
lightnovel-crawler copied to clipboard

Fix this source: RANOBES.TOP

Open MasuRii opened this issue 1 year ago • 0 comments

Novel URL: https://ranobes.top/novels/1118135-the-first-hunter-v71134.html App Location: PIP App Version: v3.3.0

Describe this issue

I am encountering an issue when trying to scrape a novel from the provided URL. The crawler fails to find the novel's title on the page, resulting in an AssertionError.

Additionally, the crawler fails to initiate a browser session using the ChromeDriver. The error message suggests that the ChromeDriver version is not compatible with my installed version of Google Chrome.

Here are the complete logs:

01:28:17 [DEBUG] (lncrawl.core)
Arguments: Namespace(log=3, log_file=None, list_sources=False, crawler=[], novel_page=None, query=None, login=None, output_formats=[], add_source_url=False, single=False, multi=False, output_path=None, filename=None, filename_only=False, force=False, ignore=False, all=False, first=None, last=None, page=None, range=None, volumes=None, chapters=None, proxy_file=None, auto_proxy=False, bot=None, shard_id=0, shard_count=1, selenium_grid=None, suppress=False, close_directly=False, extra={})
01:28:17 [DEBUG] (lncrawl.core.sources)
Loading current index data from C:\Users\Administrator\.lncrawl\sources\_index.json
01:28:17 [DEBUG] (lncrawl.core.sources)
Current index was already downloaded once
01:28:17 [DEBUG] (lncrawl.core.sources)
Saving current index data to C:\Users\Administrator\.lncrawl\sources\_index.json
01:28:17 [DEBUG] (lncrawl.core.sources)
Saving current index data to C:\Users\Administrator\.lncrawl\sources\_index.json
01:28:18 [WARNING] (lncrawl.core.sources)
Module load failed: C:\Users\Administrator\.lncrawl\sources\en\n\novelww.py | No module named 'lncrawl.utils.cleaner'
01:28:18 [INFO] (lncrawl.core.app)
Initialized App
01:28:18 [DEBUG] (asyncio)
Using proactor: IocpProactor
? Enter novel page url or query novel: https://ranobes.top/novels/1118135-the-first-hunter-v71134.html
01:28:23 [INFO] (lncrawl.bots.console.integration)
Detected URL input
01:28:23 [INFO] (lncrawl.core.sources)
Initializing crawler for: https://ranobes.top/ [C:\Users\Administrator\.lncrawl\sources\en\r\ranobes.py]
Retrieving novel info...
01:28:23 [DEBUG] (lncrawl.core.scraper)
[GET] https://ranobes.top/novels/1118135-the-first-hunter-v71134.html
timeout=(7, 301), allow_redirects=True, proxies={}, headers={b'Origin': b'https://ranobes.top', b'Referer': b'https://ranobes.top/', b'User-Agent': b'Mozilla/5.0 (Macintosh; Intel Mac OS X 12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/105.0.1343.53'}
01:28:23 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): ranobes.top:443
01:28:26 [DEBUG] (urllib3.connectionpool)
https://ranobes.top:443 "GET /novels/1118135-the-first-hunter-v71134.html HTTP/1.1" 200 None
01:28:29 [ERROR] (lncrawl.templates.browser.basic)
Failed in read novel info:
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\general.py", line 21, in read_novel_info_in_scraper
    self.novel_title = self.parse_title(soup)
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\.lncrawl\sources\en\r\ranobes.py", line 61, in parse_title
    assert tag
AssertionError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\basic.py", line 88, in read_novel_info
    self.read_novel_info_in_scraper()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\general.py", line 23, in read_novel_info_in_scraper
    raise FallbackToBrowser() from e
lncrawl.core.exeptions.FallbackToBrowser
01:28:29 [INFO] (WDM)
====== WebDriver manager ======
01:28:31 [INFO] (WDM)
There is no [win32] chromedriver "latest" for browser google-chrome "117.0.5938" in cache
01:28:31 [INFO] (WDM)
Get LATEST chromedriver version for google-chrome
01:28:31 [DEBUG] (urllib3.connectionpool)
Starting new HTTPS connection (1): chromedriver.storage.googleapis.com:443
01:28:31 [DEBUG] (urllib3.connectionpool)
https://chromedriver.storage.googleapis.com:443 "GET /LATEST_RELEASE_117.0.5938 HTTP/1.1" 404 200
Exception in thread Thread-1 (read_novel_info):
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\general.py", line 21, in read_novel_info_in_scraper
    self.novel_title = self.parse_title(soup)
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\.lncrawl\sources\en\r\ranobes.py", line 61, in parse_title
    assert tag
AssertionError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\basic.py", line 88, in read_novel_info
    self.read_novel_info_in_scraper()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\general.py", line 23, in read_novel_info_in_scraper
    raise FallbackToBrowser() from e
lncrawl.core.exeptions.FallbackToBrowser
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\basic.py", line 95, in read_novel_info
    self.read_novel_info_in_browser()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\general.py", line 47, in read_novel_info_in_browser
    self.visit_novel_page_in_browser()
  File "C:\Users\Administrator\.lncrawl\sources\en\r\ranobes.py", line 55, in visit_novel_page_in_browser
    self.visit(self.novel_url)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\templates\browser\basic.py", line 65, in visit
    self._visit(url)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\core\browser.py", line 155, in visit
    self._init_browser()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\core\browser.py", line 62, in _init_browser
    self._driver = create_new(
                   ^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\webdriver\__init__.py", line 35, in create_new
    return create_local(
           ^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\webdriver\local.py", line 109, in create_local
    executable_path = _acquire_chrome_driver_path()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\webdriver\local.py", line 28, in _acquire_chrome_driver_path
    return ChromeDriverManager().install()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\chrome.py", line 39, in install
    driver_path = self._get_driver_path(self.driver)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\core\manager.py", line 30, in _get_driver_path
    file = self._download_manager.download_file(driver.get_driver_download_url())
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\drivers\chrome.py", line 40, in get_driver_download_url
    driver_version_to_download = self.get_driver_version_to_download()
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\core\driver.py", line 51, in get_driver_version_to_download
    self._driver_to_download_version = self._version if self._version not in (None, "latest") else self.get_latest_release_version()
                                                                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\drivers\chrome.py", line 62, in get_latest_release_version
    resp = self._http_client.get(url=latest_release_url)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\core\http.py", line 37, in get
    self.validate_response(resp)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\webdriver_manager\core\http.py", line 16, in validate_response
    raise ValueError(f"There is no such driver by url {resp.url}")
ValueError: There is no such driver by url https://chromedriver.storage.googleapis.com/LATEST_RELEASE_117.0.5938
 ! Error: No chapters found
<class 'Exception'>
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\bots\console\integration.py", line 107, in start
    raise e
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\bots\console\integration.py", line 101, in start
    _download_novel()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\bots\console\integration.py", line 85, in _download_novel
    self.app.get_novel_info()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\lncrawl\core\app.py", line 137, in get_novel_info
    raise Exception("No chapters found")
01:28:31 [INFO] (lncrawl.core.app)
App destroyed

I would appreciate any help in resolving these issues. Thank you.

MasuRii avatar Oct 10 '23 17:10 MasuRii