crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

AsyncWebCrawler returns arrays of JSON objects instead of single objects per scrape

Open Udbhav8 opened this issue 1 year ago • 1 comments
trafficstars

Description

The AsyncWebCrawler is currently returning arrays of JSON objects for each scrape, even when a Pydantic schema and prompt are specified to return only one JSON object per scrape. This behavior is causing issues in our data processing pipeline and needs to be addressed.

Current Behavior

  • The AsyncWebCrawler returns an array of JSON objects for each scraped page.
  • This occurs even when a Pydantic schema is provided to define the structure of a single object.
  • The prompt given to the crawler also specifies that only one JSON object should be returned per scrape.

Expected Behavior

  • The AsyncWebCrawler should return a single JSON object for each scraped page.
  • The returned object should conform to the provided Pydantic schema.
  • The crawler should respect the prompt that specifies returning only one JSON object per scrape.

Steps to Reproduce

  1. Set up an AsyncWebCrawler instance with a specified Pydantic schema.
  2. Provide a prompt that clearly states to return a single JSON object.
  3. Perform a scrape operation on a target URL.
  4. Observe that the returned result is an array of JSON objects instead of a single object.

Udbhav8 avatar Oct 26 '24 04:10 Udbhav8

Can you show me a sample of the code you're running? I'm currently testing it and I'd appreciate it if you shared your code so I can review it.

unclecode avatar Nov 04 '24 08:11 unclecode

@Udbhav8 Closing this issue due to inactivity. Please reopen it as new issue if the problem still exists.

aravindkarnam avatar Jan 21 '25 09:01 aravindkarnam