crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

scripts from the js_snippets folder are not installed via pip

Open blghtr opened this issue 1 year ago • 7 comments

Hi!

│ × Unexpected error in crawl_web at line 11 in load_js_script (.venv\lib\site- │ │ packages\crawl4ai\js_snippet_init.py): │ │ Error: Script update_image_dimensions not found in the folder │ │ C:\Users\Gamer\PycharmProjects\scraper.venv\Lib\site-packages\crawl4ai\js_snippet │ │ │ │ Code context: │ │ 6 current_script_path = os.path.dirname(os.path.realpath(file)) │ │ 7 # Get the path of the script to load │ │ 8 script_path = os.path.join(current_script_path, script_name + '.js') │ │ 9 # Check if the script exists │ │ 10 if not os.path.exists(script_path): │ │ 11 → raise ValueError(f"Script {script_name} not found in the folder {current_script_path}") │ │ 12 # Load the content of the script │ │ 13 with open(script_path, 'r') as f: │ │ 14 script_content = f.read() │ │ 15 return script_content

returned by quick start

blghtr avatar Dec 13 '24 16:12 blghtr

Yes, I have also encountered this problem. The code in the provided colab notebook does not run, and this error is also reported. My local deployment of 0.4.1 is working fine.

1933211129 avatar Dec 14 '24 00:12 1933211129

For now, I recommend downloading JS files from repo and manually copy to /path/to/python/Lib/site-packages/crawl4ai/js_snippet

requizm avatar Dec 14 '24 10:12 requizm

also @unclecode even after doing this the js_only argument is broken to use

Udbhav8 avatar Dec 15 '24 00:12 Udbhav8

also @unclecode even after doing this the js_only argument is broken to use

my problem was coming from this code block ` try: # Set up download handling if self.browser_config.accept_downloads: page.on("download", lambda download: asyncio.create_task(self._handle_download(download)))

        # Handle page navigation and content loading
        if not config.js_only:
            await self.execute_hook('before_goto', page, context=context)

            try:
                response = await page.goto(
                    url,
                    wait_until=config.wait_until,
                    timeout=config.page_timeout
                )
            except Error as e:
                raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}")
            
            await self.execute_hook('after_goto', page, context=context)
            
            status_code = response.status
            response_headers = response.headers
        else:
            status_code = 200
            response_headers = {}

`

basically response.status fails because response is None so probably just need to handle that condition and return a 200 when the hook is not there also seems like log_console=True doesn't print any logs anymore of the js_code executing, this worked in 0.4.1

Udbhav8 avatar Dec 15 '24 00:12 Udbhav8

me too

wwwrookie avatar Dec 15 '24 11:12 wwwrookie

Hey everyone please update to 0.4.22 @wwwrookie @requizm @blghtr @Udbhav8 @1933211129

unclecode avatar Dec 16 '24 07:12 unclecode

@Udbhav8 I resolved the issue by putting it out in version 0.4.23, or 0.4.3, while collecting a few other issues and updating a patch. For js_code can you please share your code snippet? Thx

unclecode avatar Dec 16 '24 08:12 unclecode

Closing this issue due to inactivity. If the problem still exists in the newer versions, reopen this bug.

aravindkarnam avatar Jan 24 '25 09:01 aravindkarnam