crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN

Results 541 crawl4ai issues
Sort by recently updated
recently updated
newest added

## Description The `AsyncWebCrawler` is currently returning arrays of JSON objects for each scrape, even when a Pydantic schema and prompt are specified to return only one JSON object per...

question

When scraping the ranking of movies on Douban, the message "IE 11 is not supported. For an optimal experience, visit our site on another browser" appears. I also encountered the...

bug
question

A good feature IMO is to have the tokens used exposed, in order to have a good idea on how many tokens we used on each request when storing analytics...

Hello, I attempted to use the LLMExtractionStrategy code provided in the documentation for OpenAI and adapted it to work with Hugging Face. However, I encountered the following error: Provider List:...

import asyncio from crawl4ai import AsyncWebCrawler import json async def main(): async with AsyncWebCrawler(verbos=True) as crawler: result = await crawler.arun( url="https://batteriesnews.com/lg-chem-files-lawsuit-against-unit-of-chinas-ronbay-over-battery-tech", bypass_cache=True, word_count_threshold=10, ) print(result.fit_markdown) # Print clean markdown content...

Hello, I've found that version 0.3.71 is significantly more stable compared to 0.3.72. The fit_markdown function consistently returns empty results in the newer version. Additionally, using magic=True limits crawling capabilities...

According to #102 the requirements specified are minimum version. Currently they are defined as fixed versions in requirements.txt and setup.py leading to projects consuming this package are limited to using...

Some websites have a CAPTCHA mechanism that is repeated, while Playwright already has a feature to take control of the user's own browser (e.g., by launching the Chrome browser through...

I'm working on a web crawling project where I need to convert HTML content into Markdown. However, I want certain HTML tags, like ..., to remain in their original HTML...

Hi ! im currently working with the repo but when i try to webscrap multiple websites this message keeps popping up Error caching URL: database is locked async with AsyncWebCrawler(verbose=False,...

bug