crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

Bug Report for Crawl4A multiple async

Open jmontoyavallejo opened this issue 1 year ago • 1 comments

Hi UncleCode, I hope you are doing well!

First of all, I want to express my gratitude for creating Crawl4AI It’s a fantastic tool for what I’m exploring

I did come across a small bug that I wanted to bring to your attention. When I try to run the scraper with LLMs in concurrency, the output format doesn’t seem to align with the Pydantic schema, and it crashes.

This only happens when I’m running it with concurrency and combining it with other async scrapers. the output schema turns into index,tags and content

jmontoyavallejo avatar Oct 07 '24 16:10 jmontoyavallejo

Hello @jmontoyavallejo , thank you so much for your kind words. I would greatly appreciate it if you could provide a code sample that I can run and replicate the error I'm facing. What you're saying sounds interesting. Please share a sample code that demonstrates the issue when making concurrent requests to multiple URLs using the LLM. Thx

unclecode avatar Oct 08 '24 10:10 unclecode

hi @unclecode was this bug fixed? I think the issue still persists if i am not wrong. I can get the desired output when I use crawler.arun inside a for loop but givel me html tags when i use arun_many. attaching a sample code and the output.

output.txt

crawler_arun_many.txt

Sorry if this bug was fixed and I am somehow implementing this wrong. Thanks. I really appreciate the effort you have given to create and maintain crawl4ai.

saurabhj9 avatar Jan 24 '25 13:01 saurabhj9

@saurabhj9 Thanks for sharing your code sample. Will you try your code again with the recent beta version? The way you work with the function arun_many() has changed. Here, I'll give you the links to the relevant part of the documentation and explain how to work with it. Most likely, the issue has already been resolved. Let me know if it hasn't.

https://docs.crawl4ai.com/advanced/multi-url-crawling/

unclecode avatar Jan 25 '25 11:01 unclecode