crawl4ai
crawl4ai copied to clipboard
[Bug]: Browser path detection failing in Windmill.dev with crawl4ai
crawl4ai version
0.4.247
Expected Behavior
I'm trying to use crawl4ai
with Windmill (https://www.windmill.dev/) for browser automation. However, I'm having trouble setting a executable path
for the browser.
Issue:
The Windmill documentation (https://www.windmill.dev/docs/advanced/browser_automation#examples) provides an example for launching a browser instance:
const browser = await chromium.launch({
executablePath: "/usr/bin/chromium",
args: ['--no-sandbox', '--single-process', '--no-zygote', '--disable-setuid-sandbox', '--disable-dev-shm-usage', '--disable-gpu'],
});
When running crawl4ai without configuring the specific path, I receive the following error:
Error: BrowserType.launch: Executable doesn't exist at /tmp/.cache/ms-playwright/chromium-1148/chrome-linux/chrome
╔════════════════════════════════════════════════════════════╗
║ Looks like Playwright was just installed or updated. ║
║ Please run the following command to download new browsers: ║
║ ║
║ playwright install ║
║ ║
║ <3 Playwright Team ║
╚════════════════════════════════════════════════════════════╝
Or the error:
INFO Error Failed to start browser: [Errno 2] No such file or directory: 'google-chrome'
I suspect that the line browser_path = self._get_browser_path()
in async_crawler_strategy.py
is unable to automatically detect the browser's location in the Windmill environment.
Question:
How can I properly configure something like executablePath
for the browser (e.g., Chromium or Google Chrome) when using crawl4ai within Windmill?
Is there a way to manually specify the path, perhaps through an environment variable or a configuration setting within crawl4ai?
Current Behavior
Error:
Error: BrowserType.launch: Executable doesn't exist at /tmp/.cache/ms-playwright/chromium-1148/chrome-linux/chrome
╔════════════════════════════════════════════════════════════╗
║ Looks like Playwright was just installed or updated. ║
║ Please run the following command to download new browsers: ║
║ ║
║ playwright install ║
║ ║
║ <3 Playwright Team ║
╚════════════════════════════════════════════════════════════╝
Or that error:
INFO Error Failed to start browser: [Errno 2] No such file or directory: 'google-chrome'
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
# requirements:
# crawl4ai
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
# import os
# os.system("playwright install")
# os.system("playwright install-deps")
# os.system("crawl4ai-setup")
async def scrape(url: str):
try:
crawler = AsyncWebCrawler(config=BrowserConfig())
await crawler.start()
browser_config = BrowserConfig(
headless=True,
extra_args=[
"--no-sandbox",
"--single-process",
"--no-zygote",
"--disable-setuid-sandbox",
"--disable-dev-shm-usage",
"--disable-gpu",
],
verbose=True,
)
crawl_config = CrawlerRunConfig(
markdown_generator=DefaultMarkdownGenerator(),
exclude_external_links=True,
remove_overlay_elements=True,
process_iframes=False,
)
result = await crawler.arun(
url=url, config=crawl_config
) # Use await here as arun is likely async
return result
finally:
if "crawler" in locals() and crawler:
await crawler.close()
def main(url: str):
result = asyncio.run(scrape(url))
return result
OS
windmill.dev (cloud) - Linux?
Python version
3.11
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response