crawl4ai
crawl4ai copied to clipboard
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
### crawl4ai version 0.6.3 ### Expected Behavior - ### Current Behavior **Version:** `Crawl4AI v0.6.3` **Description:** When extracting raw text from a set of PDF URLs using `AsyncWebCrawler` with `PDFCrawlerStrategy` and...
### crawl4ai Version 0.6.2 ### Expected Behavior The crawler should successfully traverse and collect all valid pages up to the defined depth and page limit. ### Current Behavior The crawler...
### crawl4ai version 0.6.3 ### Expected Behavior The links array in CrawlResult should be derived based on if there is a base tag on the page.
### crawl4ai version 0.6.3 ### Expected Behavior I expect that running the script in docs [Example: Building a Knowledge Graph](https://docs.crawl4ai.com/extraction/llm-strategies/#9-example-building-a-knowledge-graph) will produce a `kb_result.json` file with knowledge graph data. ###...
### crawl4ai version docker unclecode/crawl4ai:0.6.0-r1 ### Expected Behavior "https://docs.crawl4ai.com/" need how to get crawl4ai_api_token ### Current Behavior 1、I haven pull docker unclecode/crawl4ai:0.6.0-r1 image and run it as fllows: # Make...
### crawl4ai version Crawl4AI 0.6.3 ### Expected Behavior I expected to be able to use Crawl4AI 0.6.3 with a local Ollama model (llama3.2:3b) to extract structured data (news articles) from...
### crawl4ai version 1.6.3 ### Expected Behavior return dynamic content ### Current Behavior get an "Error: list index out of range", when re-use the same crawler session ### Is this...
### crawl4ai version Crawl4AI 0.5.0.post8 ### Expected Behavior Hi, I'm new to Crawl4AI and I'm facing some issues that need clarification. I'm trying to scrape data from sites like PitchBook...
Feature/scraping strategy - refactor: Remove WebScrapingStrategy and fix metadata extraction (#995)
## Summary This PR refactors the content scraping strategy by removing the BeautifulSoup-based `WebScrapingStrategy` class and making `LXMLWebScrapingStrategy` the sole implementation. This simplifies the codebase by eliminating duplicate functionality while...
## Summary The following warning is raised on Linux when using `use_persistent_context=True` without any existing process listening to the debugging port: > [BROWSER]. ℹ pre-launch cleanup failed: Command '[['lsof', '-t',...