[Bug]: Documentation example fails (Crawling a Local HTML File)
crawl4ai version
0.6.3
Expected Behavior
Examples from the official documentation are up to date and work. Below example should work:
import asyncio
from crawl4ai import AsyncWebCrawler
from crawl4ai.async_configs import CrawlerRunConfig
async def crawl_local_file():
local_file_path = "/home/user/file.html"
file_url = f"file://{local_file_path}"
config = CrawlerRunConfig(bypass_cache=True)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(url=file_url, config=config)
if result.success:
print("Markdown Content from Local File:")
print(result.markdown)
else:
print(f"Failed to crawl local file: {result.error_message}")
asyncio.run(crawl_local_file())
Current Behavior
The example from documentation fails with error:
β― python crawling_local_html_file.py
Traceback (most recent call last):
File "/home/user/crawling_local_html_file.py", line 21, in <module>
asyncio.run(crawl_local_file())
File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/home/user/crawling_local_html_file.py", line 10, in crawl_local_file
config = CrawlerRunConfig(bypass_cache=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.venv/lib/python3.12/site-packages/crawl4ai/async_configs.py", line 993, in __init__
self.bypass_cache = bypass_cache
^^^^^^^^^^^^^^^^^
File "/home/user/.venv/lib/python3.12/site-packages/crawl4ai/async_configs.py", line 1101, in __setattr__
raise AttributeError(f"Setting '{name}' is deprecated. {self._UNWANTED_PROPS[name]}")
AttributeError: Setting 'bypass_cache' is deprecated. Instead, use cache_mode=CacheMode.BYPASS
Is this reproducible?
Yes
Inputs Causing the Bug
- none
Steps to Reproduce
Run the example from documentation.
Code snippets
OS
Ubuntu 24.04
Python version
3.12.3
Browser
google-chrome
Browser version
No response
Error logs & Screenshots (if applicable)
Please see above.
Changing the deprecated parameter as below does not work either.
import asyncio
from crawl4ai import AsyncWebCrawler, CacheMode
from crawl4ai.async_configs import CrawlerRunConfig
async def crawl_local_file():
local_file_path = "/home/user/file.html"
file_url = f"file://{local_file_path}"
config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS) # Changed here
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(url=file_url, config=config)
if result.success:
print("Markdown Content from Local File:")
print(result.markdown)
else:
print(f"Failed to crawl local file: {result.error_message}")
asyncio.run(crawl_local_file())
β― python crawling_local_html_file.py
[INIT].... β Crawl4AI 0.6.3
[ERROR]... Γ file:///home/user/file.html | Error: Unexpected error in _crawl_web at line 466 in crawl
(.venv/lib/python3.12/site-packages/crawl4ai/async_crawler_strategy.py):
Error: cannot access local variable 'captured_console' where it is not associated with a value
Code context:
461 html=html,
462 response_headers=response_headers,
463 status_code=status_code,
464 screenshot=screenshot_data,
465 get_delayed_content=None,
466 β console_messages=captured_console,
467 )
468
469 elif url.startswith("raw:") or url.startswith("raw://"):
470 # Process raw HTML content
471 raw_html = url[4:] if url[:4] == "raw:" else url[7:]
Failed to crawl local file: Unexpected error in _crawl_web at line 466 in crawl (.venv/lib/python3.12/site-packages/crawl4ai/async_crawler_strategy.py):
Error: cannot access local variable 'captured_console' where it is not associated with a value
Code context:
461 html=html,
462 response_headers=response_headers,
463 status_code=status_code,
464 screenshot=screenshot_data,
465 get_delayed_content=None,
466 β console_messages=captured_console,
467 )
468
469 elif url.startswith("raw:") or url.startswith("raw://"):
470 # Process raw HTML content
471 raw_html = url[4:] if url[:4] == "raw:" else url[7:]
Hi @ppetroskevicius
Thank you for bringing our attention to updating the document. We do need to set the cache to cache_mode=CacheMode.BYPASS.
Regarding the error you've shared, it has already been fixed in the 2025-MAY-2 branch and will be included in our next release.
@ntohidi I'm assuming this fix would also solve the same issue occuring for local markdown files?
I was able to get working with BYPass and using RAW format for a markdown input. By loading the content via the OS and passing the raw content to the crawler.
If anyone is still experiencing this problem you can simply set capture_console_messages=True for now.
import asyncio
from crawl4ai import AsyncWebCrawler
from crawl4ai.async_configs import CrawlerRunConfig, CacheMode
async def crawl_local_file():
local_file_path = "<local_file_path>" # Replace with your file path
file_url = f"file://{local_file_path}"
config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS, capture_console_messages=True # see here
)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(url=file_url, config=config)
if result.success:
print("Markdown Content from Local File:")
print(result.markdown)
else:
print(f"Failed to crawl local file: {result.error_message}")
asyncio.run(crawl_local_file())
@Lachlan-White, what is the issue occurring with local markdown files? can u pls give me more details? :)
@abab-dev Iβm not sure if I understand your problem correctly, but if youβre having trouble getting your code to work, you can provide the absolute path to your file.
async def crawl_local_file_with_workaround():
# Convert the relative file path to an absolute path
absolute_path = os.path.abspath("output.html") # Adjust this path as needed
file_url = f"file://{absolute_path}"
config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
capture_console_messages=True
)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(url=file_url, config=config)
if result.success:
print("\n--- Markdown Content from Local File ---")
print(result.markdown.raw_markdown)
print("\n--- Captured Console Messages ---")
if result.console_messages:
for msg in result.console_messages:
print(f"[{msg['type'].upper()}]: {msg['text']}")
else:
print("No console messages were captured.")
else:
print(f"Failed to crawl local file: {result.error_message}")
@ntohidi I was trying to be helpful to everyone. Instead of
@ntohidi I'm assuming this fix would also solve the same issue occuring for local markdown files?
I was able to get working with BYPass and using RAW format for a markdown input. By loading the content via the OS and passing the raw content to the crawler.
doing this if we pass the flag capture_console_messages=True like I have shown you don't have to do this workaround.
raw markdown wasn't an option in the documentation @ntohidi , so i just flagged it as raw html and it was able to work just fine :)
@ntohidi I was trying to be helpful to everyone. Instead of
@ntohidi I'm assuming this fix would also solve the same issue occuring for local markdown files? I was able to get working with BYPass and using RAW format for a markdown input. By loading the content via the OS and passing the raw content to the crawler.
doing this if we pass the flag
capture_console_messages=Truelike I have shown you don't have to do this workaround.
Oooh I understand now! Thank you for your help; really appreciate it! π