crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: Conflict of javascript codes when js_code and wait_for are specified at the same time

Open Asyou-GD opened this issue 9 months ago • 2 comments

crawl4ai version

0.4.25

Expected Behavior

In my Code,I set up wait_for as follows

wait_for = """js:() => {
               console.log(' the current number of elements is detected', document.querySelectorAll('li').length);
               return false; //always return false
           }"""

By specifying that wait_for should always return false, you should always wait for the js execution code to return true until the timeout closes the page.

for details in document:https://docs.crawl4ai.com/core/page-interaction/

For more complex conditions (e.g., waiting for content length to exceed a threshold), prefix :js:

wait_condition = """() => {
    const items = document.querySelectorAll('.athing');
    return items.length > 50;  // Wait for at least 51 items
}"""

config = CrawlerRunConfig(wait_for=f"js:{wait_condition}") Behind the Scenes: Crawl4AI keeps polling the JS function until it returns or a timeout occurs.true

Current Behavior

It doesn't seem to execute the js code in wait_for, instead it ignores it and immediately returns the html content, randomly closing the broswer.

output as follow:

duration time: 3.317031145095825s items loaded count 42 (If code js_code not added jump to next page document.querySelector('ul.pagination.pagination-lg > a:nth-last-child(4)')? .click();)

Is this reproducible?

Yes

Inputs Causing the Bug


Steps to Reproduce


Code snippets

import asyncio
import time

import crawl4ai
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode, BrowserConfig, DefaultMarkdownGenerator
from crawl4ai.extraction_strategy import JsonXPathExtractionStrategy,JsonCssExtractionStrategy
#'When the page url changes as a result of a js_code click, the js code in wait_for will no longer be executed.'
async def process_dynamic_content():
    browser_config = BrowserConfig(
        headless=False,
        verbose=False,
    )
    wait_for = """js:() => {
               console.log(' the current number of elements is detected', document.querySelectorAll('li').length);
               return false; //always return false
           }"""
    run_config = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS,
        wait_for=wait_for,
        js_code="""
            window.scrollTo(0, document.body.scrollHeight);
            document.querySelector('ul.pagination.pagination-lg > a:nth-last-child(4)')?.click();
        """,
        delay_before_return_html=0.1
    )
    time_start = time.time()
    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(
             'https://www.ihanfan.net/type/zongyi.html',
             config=run_config,
        )
    end = time.time()
    print("duration time: {}s".format(end - time_start))
    print("items loaded count",result.cleaned_html.count('lazy'))

asyncio.run(process_dynamic_content())

OS

windows

Python version

3.9.10

Browser

chromium

Browser version

No response

Error logs & Screenshots (if applicable)

If code js_code has added jump to next page document.querySelector('ul.pagination.pagination-lg > a:nth-last-child(4)')? .click(); , Output does not meet expectations Image Image

If code js_code not added jump to next page document.querySelector('ul.pagination.pagination-lg > a:nth-last-child(4)')? .click(); , the Output is as expected Image

Asyou-GD avatar Mar 07 '25 07:03 Asyou-GD

Why is the complex js code in wait_for automatically ignored after jumping to the next page with js_code?

Asyou-GD avatar Mar 07 '25 07:03 Asyou-GD

Hello @Asyou-GD would you update to our latest release v0.7.7 and let us know if you are still facing this issue?

Ahmed-Tawfik94 avatar Nov 21 '25 03:11 Ahmed-Tawfik94