crawl4ai
crawl4ai copied to clipboard
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Hello I am trying to use crawl4ai with Ollama as backend as listed in the in the config.py and mentioned on the providers page of Ollama. Note that I already...
I found a problem with v0.3.6 where the screenshot would be blank. This problem seems to occur on sites that are reloaded using javascript like "https://0-0.energy" I got around this...
It would be great if crawl4ai could scrape PDF files from websites and return both the PDF and a Markdown (MD) version of the content. Similar to this link https://arxiv.org/pdf/2402.06196...
The result.markdown doesn't include links. I've a use case where I'll be passing the markdown to LLM to identify the product details. Here I want to get the product details...
Hi, We are trying to do the LLM extraction using the sample code provided [here](https://crawl4ai.com/mkdocs/examples/llm_extraction). This is how we have added the LLM details ``` async with AsyncWebCrawler(verbose=True) as crawler:...
```python from crawl4ai import WebCrawler from crawl4ai.chunking_strategy import SlidingWindowChunking from crawl4ai.extraction_strategy import LLMExtractionStrategy crawler = WebCrawler() crawler.warmup() strategy = LLMExtractionStrategy( provider='openai', api_token=os.getenv('OPENAI_API_KEY') ) loader = crawler.run(url=all_urls[0], extraction_strategy=strategy) chunker = SlidingWindowChunking(window_size=2000,...
# fix requests version ## Description change the `requests>=2.26.0,=2.26.0,
 在更开始安装完之后,调用这个示例没有问题,但是,在我将异步agnet加到的代码中之后,就会出现,这个错误,初夏这个错误之后,再将代码恢复示例的样子,无法运行。大佬有空看一下,谢谢 ![Uploading 微信图片_20241017230103.png…]()
I am trying to crawl links from websites, but it is either returning empty results or taking too long to retrieve the links. How can I implement a strategy to...