[Bug]: When body is hidden (e.g., in `<frame>`-based sites), AsyncPlaywrightCrawlerStrategy attribute error on 'config' (at `_crawl_web`)
crawl4ai version
0.4.248
Expected Behavior
Thank you for providing such an excellent open-source crawling library! I hope this detailed bug report is helpful in improving crawl4ai's robustness and handling of diverse website structures. I'm happy to contribute further or test any proposed solutions.
When crawling a website where the <body> element is hidden, particularly in sites that primarily use <frame> elements instead of a traditional <body> structure, crawl4ai should either:
- Gracefully handle the absence of a visible
<body>and potentially attempt to extract content from the available frames. - Raise a more informative exception that directly indicates the issue (e.g., "Body element not found or hidden. Consider sites with structure.").
- The error handling logic inside of
_crawl_webwill not be crashed.
Current Behavior
An AttributeError: 'AsyncPlaywrightCrawlerStrategy' object has no attribute 'config' is raised within the _crawl_web function of the AsyncPlaywrightCrawlerStrategy. This occurs after the crawler has already determined that the <body> element is hidden or unavailable, and is triggered during the error handling process itself.
Detailed Analysis of Current Behavior:
Analysis of the error code and source code reveals the following problem:
- The
AsyncPlaywrightCrawlerStrategy's__init__method correctly setsself.browser_confighttps://github.com/unclecode/crawl4ai/blob/3b1025abbb6e2565602c05f9a959458da3531f3a/crawl4ai/async_crawler_strategy.py#L850-L864 - However, in the
_crawl_webfunction, when the code waits for the<body>element and it's not found (leading to anError), the error handling logic attempts to accessself.config, which has not been initialized. This is where theAttributeErroris raised.https://github.com/unclecode/crawl4ai/blob/3b1025abbb6e2565602c05f9a959458da3531f3a/crawl4ai/async_crawler_strategy.py#L1394-L1402.
- This issue seems to be specific to sites using
<frame>structures where the<body>element is hidden or unavailable. The error handling for the missing<body>triggers the problem withself.config. It's likely that this hasn't been reported before because most websites have a visible<body>element.
Is this reproducible?
Yes
Inputs Causing the Bug
- **URL(s):**
- http://www.ksma.co.kr/ (Korean site demonstrating the issue)
- unitednglobal.com (Another example of frame-based)
- **Settings used:**
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
base_browser = BrowserConfig(
browser_type="chromium",
headless=False, # Set to True for headless operation
# text_mode=True #optional
)
run_config = CrawlerRunConfig(
process_iframes=True,
cache_mode=CacheMode.BYPASS,
magic=True,
simulate_user=True,
override_navigator=True,
page_timeout=7000,
wait_until="networkidle"
)
Steps to Reproduce
1. Set up crawl4ai with the provided configuration (or a similar configuration).
2. Attempt to crawl the URL `http://www.ksma.co.kr/` using the `AsyncWebCrawler`.
async with AsyncWebCrawler(config=base_browser) as crawler:
result = await crawler.arun(
url="http://www.ksma.co.kr/",
config=run_config
)
3. Observe the `AttributeError` raised within the `_crawl_web` function.
Code snippets
OS
macOS
Python version
3.11.5
Browser
Chromium
Browser version
No response
Error logs & Screenshots (if applicable)
[INIT].... β Crawl4AI 0.4.248 [ERROR]... Γ http://www.ksma.co.kr/... | Error: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Γ Unexpected error in _crawl_web at line 1397 in _crawl_web (../../../.pyenv/versions/3.11.5/lib/python3.11/site- β β packages/crawl4ai/async_crawler_strategy.py): β β Error: 'AsyncPlaywrightCrawlerStrategy' object has no attribute 'config' β β β β Code context: β β 1392 raise Error(f"Body element is hidden: {visibility_info}") β β 1393 β β 1394 except Error: β β 1395 visibility_info = await self.check_visibility(page) β β 1396 β β 1397 β if self.config.verbose: β β 1398 self.logger.debug( β β 1399 message="Body visibility info: {info}", β β 1400 tag="DEBUG", β β 1401 params={"info": visibility_info}, β β 1402 ) β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
bumping this
I am experiencing this same issue. Happy to share the URLs for testing, if needed!
@sanghoho Thanks for reporting the issue. I'll look into it shortly. @RyanLynchUF Yes that would be helpful in reproducing and testing the bug. Can you share more URLs?
I had the same error listed in the original post, but I think my issue may have been a little different. It only occurred when using arun_many() on >5 URLs. I think it was a concurrency issue on my end, which prevented the pages from loading properly during the scraping. Everything seems to work fine as long as I use arun() or aruny_many() on <5 URLs at a time.
Shouldn't this be either browser_config or without self.?
I get this error also but I think this is just "typo" of parameter and doesn't matter how to end up to error. If you end up here it will always fail because AsyncCrawlerStrategy doesn't have config but browser_config.
Hi @aravindkarnam @unclecode , even i found the same issue, when trying to crawl a website, it is throwing same error.
code - `import asyncio from crawl4ai import AsyncWebCrawler from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig
async def main(): browser_config = BrowserConfig(verbose=True, java_script_enabled=True, browser_type="chromium", headless=True, viewport_width=1280, viewport_height=720, user_agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/116.0.0.0 Safari/537.36") # Default browser configuration run_config = CrawlerRunConfig(check_robots_txt=True,scan_full_page=True) # Default crawl run configuration
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url="https://eminds.ai/",
config=run_config
)
print(result.markdown) # Print clean markdown content
if name == "main": asyncio.run(main())`
error - [INIT].... β Crawl4AI 0.5.0.post4 [ERROR]... Γ https://eminds.ai/... | Error: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Γ Unexpected error in _crawl_web at line 622 in _crawl_web (../../../../Findly/py3.9.7/lib/python3.9/site- β β packages/crawl4ai/async_crawler_strategy.py): β β Error: 'AsyncPlaywrightCrawlerStrategy' object has no attribute 'config' β β β β Code context: β β 617 raise Error(f"Body element is hidden: {visibility_info}") β β 618 β β 619 except Error: β β 620 visibility_info = await self.check_visibility(page) β β 621 β β 622 β if self.config.verbose: β β 623 self.logger.debug( β β 624 message="Body visibility info: {info}", β β 625 tag="DEBUG", β β 626 params={"info": visibility_info}, β β 627 ) β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
None
Thanks everyone for reporting this! Iβve already fixed it in the 2025-APR-1 branch, and it will be included in an upcoming release. In the meantime, feel free to check out the branch and help test it.
Iβll go ahead and close this issue, but donβt hesitate to continue the conversation here if needed!
cc @aravindkarnam