instill-core icon indicating copy to clipboard operation
instill-core copied to clipboard

[INS-2214] [Feature] Web Crawler Operator

Open praharshjain opened this issue 2 years ago • 5 comments
trafficstars

Is There an Existing Issue for This?

  • [X] I have searched the existing issues

Project

Instill VDP

Is your Proposal Related to a Problem?

No, it is a new feature request.

Describe Your Proposed Solution

We can implement a "Web Crawler" operator that will take an initial URL & a depth (int) as input and recursively extract links from those pages up to the given depth, finally returning a list of strings (extracted URLs).

Highlight the Benefits

Such an operator will be useful for crawling and gathering online data. For example, the links captured by it can then be fed to the text extraction operator to build a knowledge base from linked documents.

Anything Else?

No response

INS-2214

praharshjain avatar Sep 29 '23 19:09 praharshjain

This issue is a great way to kick-start your journey with our project, or to make a positive impact on open-source development. Jump in!

:sparkles: Thank you for your contribution! :sparkles:

github-actions[bot] avatar Sep 29 '23 19:09 github-actions[bot]

Can you please assign me this ?

AnkitaMalik22 avatar Oct 01 '23 16:10 AnkitaMalik22

Hello @praharshjain, please assign this issue to me as i already worked on this kind of problem in past and has a great experience.

itssiddhantjain avatar Oct 02 '23 11:10 itssiddhantjain

hey @praharshjain i want to work on this issue , as its my 1st work in ai so i really wanna work in this issue . thankyou!!please assign me

lazyMonk1010 avatar Oct 03 '23 15:10 lazyMonk1010

Can you please assign me this ?

Hi @AnkitaMalik22! Absolutely, we’re thrilled about your interest in our project! :rocket: Here’s the Contributing Guideline for Instill VDP to get you started on your journey! Please refer to the Contributing Guidelines for components as well. Don’t forget to link your pull request to the corresponding issue, and after your PR gets merged, please complete this form to claim your well-deserved points! If you ever have any questions or need a hand along the way, don’t hesitate to drop a message in this thread or hop into our Discord. Happy contributing! :blush::star2:

harshsoni7 avatar Oct 04 '23 08:10 harshsoni7