crawl4ai
crawl4ai copied to clipboard
[Bug]: Injected JavaScript Not Executed Properly by Crawler
crawl4ai version
0.6.2
Expected Behavior
Js code should be executed by accepting cookies in page.
Current Behavior
-
The crawler is failing to execute custom JavaScript injected via js_code, which is intended to interact with elements on the page (e.g., accepting cookies). Even after injecting valid JS, the behavior is not as expected β the button is not being clicked, and the cookie prompt remains. Is there other settings by which i can accept cookies and crawl these pages. I am facing issues while crawling pages like this.
-
We are still encountering duplicate URLs during deep crawling. Additionally, even when an explicit content-type filter is applied with text/html , PDF files are still being crawled.
cc: @unclecode @aravindkarnam @ntohidi
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
import asyncio
from crawl4ai import AsyncWebCrawler
from crawl4ai.async_configs import CrawlerRunConfig
async def main():
js_code = """
setTimeout(() => {
const acceptButton = document.querySelector('a.wscrOk2');
if(acceptButton) {
console.log('Found accept button - clicking...');
acceptButton.click();
setTimeout(() => {
console.log('Cookies should be accepted now');
}, 1000);
} else {
console.log('Accept button not found');
}
}, 2000);
"""
config = CrawlerRunConfig(
js_code=js_code,
scan_full_page=True,
check_robots_txt=False,
verbose=True,
)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="https://benefits.workday.com/uk/eap",
config=config
)
print(result.markdown)
if __name__ == "__main__":
asyncio.run(main())
output -
We place cookies on your device to enable this site to work, to enhance your user experience and to improve our services. Some cookies we use are necessary for the site to work, while others are used to help us manage and improve the site and the services we offer you. If youβre happy to opt-in to our use of cookies just click the "Accept all cookies" button.
[Necessary cookies only](https://benefits.workday.com/uk/eap)[Accept all cookies](https://benefits.workday.com/uk/eap)
[Review our use of cookies and set your preferences](https://benefits.workday.com/uk/eap)
Our website uses cookies to distinguish you from other users of our website. This helps us to provide you with a good experience when you browse our website and also allows us to improve our site.
This Cookie Policy sets out the
OS
linux
Python version
3.9.7
Browser
linux
Browser version
131.0.6778.139
Error logs & Screenshots (if applicable)
No response