Scrapegraph-ai Scraping n levels deep

Is your feature request related to a problem? Please describe. I'd like to scrape a website n-levels deep.

Describe the solution you'd like For example, given url = example.com, the scraper should also follow the links in example.com and scrape those too

Describe alternatives you've considered I can use BeautifulSoup and download the pages and then feed them to this

Apr 30 '24 01:04 rawmean

Hei @rawmean, we will add it in the to-do list for feature requests! It would be interesting to create a new graph for this and maybe calling it CrawlerGraph or DeepScraperGraph

Apr 30 '24 12:04 PeriniM

I'll try to take a stab at it. This is what I'm thinking: Input: URL

FetchNode
ParseNode
RAGNode
SearchLinkNode -> Get all the links on the page
(new) LinkFilterNode -> Filter out potentially relevant links
(new) RepeaterNode -> Executes graph from child node onwards once for each of the input link in parallel
FetchNode
ParseNode
RAGNode
(new) ContainsAnswerNode -> A new node type that can tell if the currect content contains the answer
(new) ConditionalNode -> A new node with two children, if parent returns true, pick child 1 or else pick child 2 12a. GenerateAnswerNode 12b. Go to step 4 for next level of depth

Let me know if this looks reasonable or if you have any other plan/better alternative that you can think of

May 07 '24 13:05 mayurdb

Yeah, pls contact me thorough email ([email protected])

May 07 '24 14:05 VinciGit00

Sounds really intresting.

May 09 '24 18:05 ChrisDelClea

I am looking for the feature too. There are two use cases: 1.Loop through several path levels of a website, to extract information from all item pages. like to extract all shop item informations, all renting houses prices and locations. In this case, I can specify which paths will be processed by regex expressions. 2.Loop through all pages of a small website. It behaves like crawler as nutch, while I can specify what I will get from each page. There is a prompt to match the target page, and a prompt to get data/files from that page. Sometimes I need to crawl all videos/images of a specified condition for the website.

Sep 26 '24 09:09 davideuler

Scrapegraph-ai Scrapegraph-ai copied to clipboard

Scraping n levels deep

Scrapegraph-ai
Scrapegraph-ai copied to clipboard