Fabric icon indicating copy to clipboard operation
Fabric copied to clipboard

Add functionality to scrape url with Jina AI

Open noamsiegel opened this issue 1 year ago • 5 comments

What this Pull Request (PR) does

  • added --scrape_url or -su CLI command to curl the content of a webpage in markdown form
  • uses Jina AI
  • no API key is needed

How to use it

The easiest way to use it is with this format: fabric -su {URL} | fabric -sp {fabric}. This scrapes the {URL}, transforms it into markdown, and then pipes it into fabric. An example is provided in the screenshot below.

Screenshots

image

noamsiegel avatar Jun 13 '24 01:06 noamsiegel

Great addition!

silverstreak avatar Jun 14 '24 11:06 silverstreak

Since Fabric is transitioning to Go, will all PRs to the original repo be held in limbo?

noamsiegel avatar Jun 17 '24 19:06 noamsiegel

I really want this added, this would be great tooling to have

timothyjoh avatar Jun 21 '24 21:06 timothyjoh

Looks like there is a ton of noise in this PR, possibly due to some formatter that changed it. While I approve of the formatting changes, it would be better to separate that into another PR so that this Jina addition is easier to approve.

timothyjoh avatar Jun 21 '24 21:06 timothyjoh

Looks like there is a ton of noise in this PR, possibly due to some formatter that changed it. While I approve of the formatting changes, it would be better to separate that into another PR so that this Jina addition is easier to approve.

I just removed all the formatting. I'm sorry about that. Let me know what else should be changed for this to be added!

noamsiegel avatar Jun 26 '24 22:06 noamsiegel

This is great! We're working on some more robust scraping options though so I think we wait and address this after it moves to Go.

danielmiessler avatar Jul 01 '24 20:07 danielmiessler