crawl4ai issues

AsyncWebCrawler returns arrays of JSON objects instead of single objects per scrape

1

## Description The `AsyncWebCrawler` is currently returning arrays of JSON objects for each scrape, even when a Pydantic schema and prompt are specified to return only one JSON object per...

Udbhav8

question

IE 11 is not supported. For an optimal experience visit our site on another browser

1

When scraping the ranking of movies on Douban, the message "IE 11 is not supported. For an optimal experience, visit our site on another browser" appears. I also encountered the...

thetesttoy

bug

question

Expose completion tokens, total tokens, cost, etc. on OpenAI

1

A good feature IMO is to have the tokens used exposed, in order to have a good idea on how many tokens we used on each request when storing analytics...

DrakkoFire

Facing error in using open source LLM

2

Hello, I attempted to use the LLMExtractionStrategy code provided in the documentation for OpenAI and adapted it to work with Hugging Face. However, I encountered the following error: Provider List:...

RiteshKB

AttributeError: 'CrawlResult' object has no attribute 'fit_markdown'

1

import asyncio from crawl4ai import AsyncWebCrawler import json async def main(): async with AsyncWebCrawler(verbos=True) as crawler: result = await crawler.arun( url="https://batteriesnews.com/lg-chem-files-lawsuit-against-unit-of-chinas-ronbay-over-battery-tech", bypass_cache=True, word_count_threshold=10, ) print(result.fit_markdown) # Print clean markdown content...

wg20170107

Version 0.3.71 is more stable than 0.3.72

4

Hello, I've found that version 0.3.71 is significantly more stable compared to 0.3.72. The fit_markdown function consistently returns empty results in the newer version. Additionally, using magic=True limits crawling capabilities...

YassKhazzan

build: make requirements more flexible

According to #102 the requirements specified are minimum version. Currently they are defined as fixed versions in requirements.txt and setup.py leading to projects consuming this package are limited to using...

mjvankampen

Could it be possible to take control of the user's own browser instead?

Some websites have a CAPTCHA mechanism that is repeated, while Playwright already has a feature to take control of the user's own browser (e.g., by launching the Chrome browser through...

BZBY

(Question) How to retain specific HTML tags (e.g., <span class="entity-embed">) in HTML-to-Markdown conversion without converting them?

1

I'm working on a web crawling project where I need to convert HTML content into Markdown. However, I want certain HTML tags, like ..., to remain in their original HTML...

truonghoangnguyen

cannot bypass cache db

6

Hi ! im currently working with the repo but when i try to webscrap multiple websites this message keeps popping up Error caching URL: database is locked async with AsyncWebCrawler(verbose=False,...

jmontoyavallejo

bug

crawl4ai
crawl4ai copied to clipboard

Metadata

AsyncWebCrawler returns arrays of JSON objects instead of single objects per scrape

IE 11 is not supported. For an optimal experience visit our site on another browser

Expose completion tokens, total tokens, cost, etc. on OpenAI

Facing error in using open source LLM

AttributeError: 'CrawlResult' object has no attribute 'fit_markdown'

Version 0.3.71 is more stable than 0.3.72

build: make requirements more flexible

Could it be possible to take control of the user's own browser instead?

(Question) How to retain specific HTML tags (e.g., <span class="entity-embed">) in HTML-to-Markdown conversion without converting them?

cannot bypass cache db

← Metadata

Owner

Metadata

crawl4ai crawl4ai copied to clipboard

Metadata

← Metadata

Owner

Metadata

crawl4ai
crawl4ai copied to clipboard