community
community copied to clipboard
[INS-2214] [Feature] Web Crawler Operator
Is There an Existing Issue for This?
- [X] I have searched the existing issues
Project
Instill VDP
Is your Proposal Related to a Problem?
No, it is a new feature request.
Describe Your Proposed Solution
We can implement a "Web Crawler" operator that will take an initial URL & a depth (int) as input and recursively extract links from those pages up to the given depth, finally returning a list of strings (extracted URLs).
Highlight the Benefits
Such an operator will be useful for crawling and gathering online data. For example, the links captured by it can then be fed to the text extraction operator to build a knowledge base from linked documents.
Anything Else?
No response