using js_code and wait_for together is broken in 0.4.22
If i pass in any js_code in the crawler it returns this error
i have also explained the issue here
I think commit 0982c63 broke this
probably just need a null check for response in here, i fixed it right now with manually copying this file with the null check into my docker build
@Udbhav8 Can you share the code snippet and URL? I can't replicate this error. Please share those with me, and I will see what is causing that. Right npw the following code works well:
async def main():
# Configure the browser settings
browser_config = BrowserConfig()
# Set run configurations, including cache mode and markdown generator
crawl_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
page_timeout=60000,
js_code="(()=> {console.log('hi');})()",
log_console=True,
)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url='https://kidocode.com/',
config=crawl_config
)
if result.success:
print("Raw Markdown Length:", len(result.markdown_v2.raw_markdown))
print("Citations Markdown Length:", len(result.markdown_v2.markdown_with_citations))
if __name__ == "__main__":
asyncio.run(main())
You can check this in Colab here https://colab.research.google.com/drive/1Ge5GvHwwAgM9LtIhjjJIcLGx8VXEKq2V?usp=sharing
self.crawler_args = {
"headless": True,
"remove_overlay_elements": True,
"verbose": True,
"always_bypass_cache": True,
"bypass_cache": True,
"light_mode": True,
"user_agent_mode": "random",
"user_agent_generator_config": {
"device_type": "mobile",
"os_type": "android",
},
}
js_code = """
// Function to check if next page exists and click it
const nextButton = document.querySelector('kendo-pager-next-buttons span[title="Go to the next page"]');
console.log('Next button found:', nextButton);
if (nextButton) {
nextButton.click();
console.log('Clicked next button');
} else {
console.log('No next button found - might be on last page');
}
"""
wait_condition = """() => {
// Then check if document is ready and navigation is complete
if (document.readyState !== 'complete') {
console.log('Document not ready yet:', document.readyState);
return false;
}
// Then check for job cells
const jobCells = document.querySelectorAll('td[kendogridcell] a[href*="/vendor/jobs/details/"]');
console.log('Number of job cells found:', jobCells.length);
return jobCells.length > 0;
}"""
result = await crawler.arun(
session_id=session_id,
url="https://app.lotusone.com/#/vendor/jobs",
js_code=js_code,
wait_for=f"js:{wait_condition}",
log_console=True,
)
and this is the logs it prints
its a page with login so I will also have to give you the cookies for it - could you suggest me a time i can send it to you so it doesn't expire and somewhere to send it to you?
I can also confirm changing the code in async_crawler_strategy.py to this worked for me but now i have to do these changed in my dockerfile for everything to work as expected
await self.execute_hook("before_goto", page, context=context)
try:
response = await page.goto(
url,
wait_until=config.wait_until,
timeout=config.page_timeout,
)
except Error as e:
raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{e!s}")
await self.execute_hook("after_goto", page, context=context)
if response:
status_code = response.status
response_headers = response.headers
else:
status_code = 200
response_headers = {}
else:
status_code = 200
response_headers = {}
@Udbhav8 Please try to send me a message by Thursday, 19 December, at 2 p.m. Singapore time. Maybe you can create an entry in the calendar using my email address, and then we can align and communicate together ([email protected]). Besides this, I also suggest that you try to manage the browser, especially for your case. I am providing you with two links to other issues where I gave very detailed answers, and I believe that will help you a lot. Finally, I really want to continue addressing this error. I want to know the situations in which the response is a non-type; that is interesting to me. Before I use an if and else statement to manage it, I need to know when that happens.
https://github.com/unclecode/crawl4ai/issues/341#issuecomment-2541447030 https://github.com/unclecode/crawl4ai/issues/341#issuecomment-2546023875
@Udbhav8 Please try to send me a message by Thursday, 19 December, at 2 p.m. Singapore time. Maybe you can create an entry in the calendar using my email address, and then we can align and communicate together ([email protected]). Besides this, I also suggest that you try to manage the browser, especially for your case. I am providing you with two links to other issues where I gave very detailed answers, and I believe that will help you a lot. Finally, I really want to continue addressing this error. I want to know the situations in which the response is a non-type; that is interesting to me. Before I use an if and else statement to manage it, I need to know when that happens.
Perfect I have sent you a meeting invite for exactly that time, I will also send you an email with the storage_state exactly at 2pm so you can look in case you aren't able to join the meet
i have sent you an email with the storage_state object from [email protected] @unclecode
let me know if there is another time i can send you the tokens again so we can test syncronously @unclecode
@Udbhav8 I apologize for missing this conversation. Let's schedule another time now. We can plan for either Thursday 26th Dec, or Friday 27th Dec at 2 p.m. Singapore time. Let me know which day works for you, and I'll create the event in the calendar. I will make sure to be available to test the game together. I apologize for the previous one.
Sorry @unclecode i was out for holidays , why dont we just do this Send me an invite for a meeting on [email protected] and i can make sure i will make it work, just coz i dont get notifications for the github issue updates haha
@Udbhav8 I sent you invitation to Discord there we can chat and plan faster.
did you send it to my email [email protected] , i haven't recieved anything
Its done
@Udbhav8 I tried the following code(based on snippet you shared) with the latest version. I can see that both code in js_code and wait_for executed( I could see from console logs and no issue with the response). If the issue still persists, reopen this issue. I think now the site has changed and asking for email before it displays jobs. So you may have to change your code as well.
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode, BrowserConfig
browser_config = BrowserConfig(
headless=True,
verbose=True,
light_mode=True,
# user_agent_mode="random",
# user_agent_generator_config={
# "device_type": "mobile",
# "os_type": "android",
# },
)
js_code = """
// Function to check if next page exists and click it
const nextButton = document.querySelector('kendo-pager-next-buttons span[title="Go to the next page"]');
console.log('Next button found:', nextButton);
if (nextButton) {
nextButton.click();
console.log('Clicked next button');
} else {
console.log('No next button found - might be on last page');
}
"""
wait_condition = """() => {
// Then check if document is ready and navigation is complete
if (document.readyState !== 'complete') {
console.log('Document not ready yet:', document.readyState);
return false;
}
// Then check for job cells
const jobCells = document.querySelectorAll('td[kendogridcell] a[href*="/vendor/jobs/details/"]');
console.log('Number of job cells found:', jobCells.length);
return jobCells.length > 0;
}"""
async def main():
async with AsyncWebCrawler(config=browser_config) as crawler:
session_id = "lotusone"
# Run the crawler on a URL
result = await crawler.arun(
url="https://app.lotusone.com/#/vendor/jobs",
config = CrawlerRunConfig(
session_id=session_id,
remove_overlay_elements=True,
js_code=js_code,
wait_for=f"js:{wait_condition}",
log_console=True)
)
print(result.markdown.raw_markdown)
# Print the extracted content
asyncio.run(main())