[Bug]: Conflict of javascript codes when js_code and wait_for are specified at the same time
crawl4ai version
0.4.25
Expected Behavior
In my Code,I set up wait_for as follows
wait_for = """js:() => {
console.log(' the current number of elements is detected', document.querySelectorAll('li').length);
return false; //always return false
}"""
By specifying that wait_for should always return false, you should always wait for the js execution code to return true until the timeout closes the page.
for details in document:https://docs.crawl4ai.com/core/page-interaction/
For more complex conditions (e.g., waiting for content length to exceed a threshold), prefix :js:
wait_condition = """() => {
const items = document.querySelectorAll('.athing');
return items.length > 50; // Wait for at least 51 items
}"""
config = CrawlerRunConfig(wait_for=f"js:{wait_condition}") Behind the Scenes: Crawl4AI keeps polling the JS function until it returns or a timeout occurs.true
Current Behavior
It doesn't seem to execute the js code in wait_for, instead it ignores it and immediately returns the html content, randomly closing the broswer.
output as follow:
duration time: 3.317031145095825s items loaded count 42 (If code js_code not added jump to next page
document.querySelector('ul.pagination.pagination-lg > a:nth-last-child(4)')? .click();)
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
import asyncio
import time
import crawl4ai
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode, BrowserConfig, DefaultMarkdownGenerator
from crawl4ai.extraction_strategy import JsonXPathExtractionStrategy,JsonCssExtractionStrategy
#'When the page url changes as a result of a js_code click, the js code in wait_for will no longer be executed.'
async def process_dynamic_content():
browser_config = BrowserConfig(
headless=False,
verbose=False,
)
wait_for = """js:() => {
console.log(' the current number of elements is detected', document.querySelectorAll('li').length);
return false; //always return false
}"""
run_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
wait_for=wait_for,
js_code="""
window.scrollTo(0, document.body.scrollHeight);
document.querySelector('ul.pagination.pagination-lg > a:nth-last-child(4)')?.click();
""",
delay_before_return_html=0.1
)
time_start = time.time()
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
'https://www.ihanfan.net/type/zongyi.html',
config=run_config,
)
end = time.time()
print("duration time: {}s".format(end - time_start))
print("items loaded count",result.cleaned_html.count('lazy'))
asyncio.run(process_dynamic_content())
OS
windows
Python version
3.9.10
Browser
chromium
Browser version
No response
Error logs & Screenshots (if applicable)
If code js_code has added jump to next page document.querySelector('ul.pagination.pagination-lg > a:nth-last-child(4)')? .click(); , Output does not meet expectations
If code js_code not added jump to next page document.querySelector('ul.pagination.pagination-lg > a:nth-last-child(4)')? .click(); , the Output is as expected
Why is the complex js code in wait_for automatically ignored after jumping to the next page with js_code?
Hello @Asyou-GD would you update to our latest release v0.7.7 and let us know if you are still facing this issue?