crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

just want to ask is this tool good for scraping forums?

Open iorilu opened this issue 1 year ago • 2 comments
trafficstars

I want to scrape a forum to get data for maybe llm model fine-tuning

forums have boards boards have threads thread include details like title, author, datetime..etc it's better to support pages

is this tool good for this requirement?

iorilu avatar Oct 21 '24 00:10 iorilu

Thanks for showing your interest in our library. I'd be happy to explain and provide assistance to help make your decision. As of now, our main focus has been on creating a Crawl function that is robust, fast, and can extract proper information from a given URL. And I can say that we have been able to achieve that level. The second part, which is currently under development and hopefully will be available within two to three weeks, is our scraper module. While the crawling goal involved focusing on a single URL, the scraping module's goal is to traverse the website as a graph, extract all the information in a neat and organized manner.

Right now, you can use the crawler, extract all the links and external links, and then decide what you're going to do about those links and crawl them again. Additionally, you can wait for these scraping modules to be released. However, remember that our library is making progress and we continue to grow as more people use it and raise their issues.

Therefore, when you start using our projects, you will get really good support at this stage of our library. We learn from your projects and improve our library, and in return, you will receive our support. Feel free to try, continue, and let us know; we'll help each other along the way. Thank you again.

unclecode avatar Oct 21 '24 06:10 unclecode

thanks for the details , I will start trying this first

iorilu avatar Oct 21 '24 23:10 iorilu

@iorilu You're welcome, Please let me know if you face any problems. I'm interested in helping and learning more about people case studies, and that will also help improve the library.

unclecode avatar Oct 24 '24 11:10 unclecode