crawl4ai issues

README Example: CSS Selector Not Functioning

## Description I've identified an issue in the Advanced Usage example in the README.md file. The current CSS selector used for extracting content from the NBC News business page is...

hitesh22rana

bug

enhancement

aws lambda layer

Hi, I tried to create a lambda layer for this library but it's not working, is there a lambda layer zip or docker image to use the library in lambda?

vikaskookna

Many pages like https://www.wsj.com/world/china/chinas-patriotic-rhetoric-takes-a-violent-turn-6266ca09: are not crawlable. I've tried both sync and async mode, all returns failure: ``` [ERROR] 🚫 Failed to crawl https://www.nbcnews.com/business, error: Failed to crawl https://www.nbcnews.com/business: Timeout...

immartian

爬虫

guyingjie007

Howto set header for LLMExtractionStrategy

Needed to pass an Authorization header field to the LLM service, that is run on a own server with proxy/authentication in place. How is that possible?

drdsgvo

Docker Image

24

I created aws lambda docker image, and it fails on this line from crawl4ai import AsyncWebCrawler ```{ "errorMessage": "[Errno 30] Read-only file system: '/home/sbx_user1051'", "errorType": "OSError", "requestId": "", "stackTrace": [...

vikaskookna

enhancement

question

Bad results crawling mantine docs `?t=props`

3

Hey thx for the lib :) Playing around with it trying to crawl: `https://mantine.dev/core/button/?t=props` If you have a quick answer why it doesn't work, that would be great, else I'll...

Dimfred

bug

feature/add_timeout_AsyncPlaywrightCrawlerStrategy add timeout

jmontoyavallejo

Page.goto: Timeout 60000ms exceeded.

1

could you please add the possibility to change the timeout, in some places and containers could take more than 60 seconds crawl4ai/crawl4ai/async_crawler_strategy.py line 251 response = await page.goto(url, wait_until="domcontentloaded", timeout=60000)

jmontoyavallejo

ImportError: cannot import name 'config' from partially initialized module 'html2text'

Traceback (most recent call last): File "C:\Users\57682\PycharmProjects\pythonProject\main.py", line 2, in from crawl4ai import AsyncWebCrawler File "C:\Users\57682\PycharmProjects\pythonProject\venv\Lib\site-packages\crawl4ai\__init__.py", line 3, in from .async_webcrawler import AsyncWebCrawler File "C:\Users\57682\PycharmProjects\pythonProject\venv\Lib\site-packages\crawl4ai\async_webcrawler.py", line 9, in from .chunking_strategy...

fengwu-coder

crawl4ai
crawl4ai copied to clipboard

Metadata

README Example: CSS Selector Not Functioning

aws lambda layer

Can't crawl WSJ sites

爬虫

Howto set header for LLMExtractionStrategy

Docker Image

Bad results crawling mantine docs `?t=props`

feature/add_timeout_AsyncPlaywrightCrawlerStrategy add timeout

Page.goto: Timeout 60000ms exceeded.

ImportError: cannot import name 'config' from partially initialized module 'html2text'

← Metadata

Owner

Metadata

crawl4ai crawl4ai copied to clipboard

Metadata

← Metadata

Owner

Metadata

crawl4ai
crawl4ai copied to clipboard