crawl4ai issues

[Bug]: Batch Processing with MemoryAdaptiveDispatcher Missing Crawl Outputs

1

### crawl4ai version 0.5.0 ### Expected Behavior All 680 product URLs passed to crawler.arun_many() should produce a corresponding result.extracted_content if the crawling and extraction process succeeds. Each result should be...

jacobshenn

🐞 Bug

🩺 Needs Triage

[Bug]: Unexpected error in _crawl_web

5

### crawl4ai version 0.7.7 ### Expected Behavior Should parse the webpage correctly. ### Current Behavior When crawling this page: https://www.toshiba-lifestyle.com/th-en/blog/how-to-choose-the-right-laundry-product-for-you I get the following error: ``` [ERROR]... × https://www.toshiba-lif...laundry-product-for-you |...

Martichou

🐞 Bug

📌 Root caused

[Bug]: Webhook API within Docker missing CrawlerConfig/Browser config

1

### crawl4ai version latest ### Expected Behavior I successfully build a crawler request with BrowserConfig and CrawlerConfig with CSSExtraction, etc. Now I want to build a webhook strategy to not...

devonik

🐞 Bug

🩺 Needs Triage

Fix BrowserConfig proxy_config serialization

## Summary Summary Ensures `BrowserConfig.to_dict()` emits JSON-safe data by converting nested ProxyConfig objects into dictionaries. Prevents `TypeError: Object of type ProxyConfig is not JSON serializable` in environments (like Docker) that...

SohamKukreti

[Bug]: Docker API JSON serialization fails for ProxyConfig

1

### crawl4ai version 0.7.7 ### Expected Behavior When using `proxy_config` with `BrowserConfig`, the configuration should be serializable to JSON for use with the Docker API server's crawler pool. The `BrowserConfig.to_dict()`...

SohamKukreti

🐞 Bug

⚙️ In-progress

📌 Root caused

fix: prevent memory leak by closing unused context

## Summary When scraping many URLs continuously, browser contexts accumulate in memory and are never cleaned up. The existing cleanup mechanism only runs when browsers go idle, which never happens...

Martichou

[Bug]: mean_delay does not work with CrawlerRunConfig

1

### crawl4ai version 0.7.6 ### Expected Behavior When i set mean_delay, it should be delayed between requests ### Current Behavior It ignores mean_delay config ### Is this reproducible? Yes ###...

nguyenthengocdev

🐞 Bug

🩺 Needs Triage

chore: update lxml version

1

## Summary Updates lxml dependency to 6.0. ## List of files changed and why To upgrade the lxml constraint and regenerate the lock file, these files were touched: pyproject.toml uv.lock...

mziv

[Bug]: wrong permissions on the .cache folder in docker image

4

hello, trying to use the link scoring feature with the following config but im getting the error below. crawl4ai running in docker. any idea what is wrong? ``` [LINK_EXTRACT] ℹ...

faileon

🐞 Bug

⚙️ In-progress

[Bug]: The LLM strategy always sends all tokens from the URL to the LLM server even the URL input is HTML content

2

### crawl4ai version 0.6.3 ### Expected Behavior my example crawler: ``` llm_strategy = LLMExtractionStrategy( llm_config=self.llm_config, schema=PdfDoc.model_json_schema(), extraction_type="schema", instruction=""" From the crawled content, extract data from html - data in html...

phamngocquy

🐞 Bug

⚙ Done

📌 Root caused

crawl4ai
crawl4ai copied to clipboard

Metadata

[Bug]: Batch Processing with MemoryAdaptiveDispatcher Missing Crawl Outputs

[Bug]: Unexpected error in _crawl_web

[Bug]: Webhook API within Docker missing CrawlerConfig/Browser config

Fix BrowserConfig proxy_config serialization

[Bug]: Docker API JSON serialization fails for ProxyConfig

fix: prevent memory leak by closing unused context

[Bug]: mean_delay does not work with CrawlerRunConfig

chore: update lxml version

[Bug]: wrong permissions on the .cache folder in docker image

[Bug]: The LLM strategy always sends all tokens from the URL to the LLM server even the URL input is HTML content

← Metadata

Owner

Metadata

crawl4ai crawl4ai copied to clipboard

Metadata

← Metadata

Owner

Metadata

crawl4ai
crawl4ai copied to clipboard