Fabric icon indicating copy to clipboard operation
Fabric copied to clipboard

Scrape url

Open noamsiegel opened this issue 1 year ago • 1 comments

What this Pull Request (PR) does

  • added --scrape_url or -u CLI command to curl the content of a webpage in markdown form
  • uses Jina AI
  • no API key is needed

Screenshots

Video Overview

How to use it

The easiest way to use it is with this format: fabric -scrape_url {URL} | fabric --stream --pattern {fabric}. This scrapes the {URL}, transforms it into markdown, and then pipes it into fabric.

noamsiegel avatar Aug 21 '24 18:08 noamsiegel

also s.jina can be added, ive added it on my own path and use it ez

Uddhav07 avatar Aug 23 '24 17:08 Uddhav07

Hi, thank you for the PR.

I see two problems:

  1. You are calling external command here, it will not work for Windows and if curl is not installed. We would need a native Go implementation. Maybe there is a Go client library for the service or to use some Rest API client libraries to call the service.

curlCommand := fmt.Sprintf("curl https://r.jina.ai/%s", url) fmt.Println("Executing command:", curlCommand) // Debug print if err := exec.Command("sh", "-c", curlCommand).Run(); err != nil { return "", fmt.Errorf("failed to run curl command: %w", err) }`

  1. Fabric uses OOP programing style. So, we would need a struct/methods implementation. E.g. check the YouTube impl. youtube/youtube.go

eugeis avatar Aug 29 '24 17:08 eugeis

also s.jina can be added, ive added it on my own path and use it ez

I made a similar tool (https://github.com/johnconnor-sec/fabric/tree/jina-tool).

Maybe we all could work together to make it great... ?!

johnconnor-sec avatar Sep 02 '24 18:09 johnconnor-sec

Holy crap this is incredible. And you did a video showing it!

One question / concern is how scalable / dependable jina ai will be as a solution, especially if tons of us are using this.

Also, how well it scrapes content that's resistant to scraping.

Thoughts?

danielmiessler avatar Sep 08 '24 00:09 danielmiessler

Hi, thank you for the PR.

I see two problems:

  1. You are calling external command here, it will not work for Windows and if curl is not installed. We would need a native Go implementation. Maybe there is a Go client library for the service or to use some Rest API client libraries to call the service.

curlCommand := fmt.Sprintf("curl https://r.jina.ai/%s", url) fmt.Println("Executing command:", curlCommand) // Debug print if err := exec.Command("sh", "-c", curlCommand).Run(); err != nil { return "", fmt.Errorf("failed to run curl command: %w", err) }`

  1. Fabric uses OOP programing style. So, we would need a struct/methods implementation. E.g. check the YouTube impl. youtube/youtube.go

Great points here. We need to make sure all functionality in the app works across all the platforms.

danielmiessler avatar Sep 08 '24 01:09 danielmiessler

Holy crap this is incredible. And you did a video showing it!

One question / concern is how scalable / dependable jina ai will be as a solution, especially if tons of us are using this.

Also, how well it scrapes content that's resistant to scraping.

Thoughts?

It's completely free with rate limit and if we add api key which is free we get millions of tokens without rate limit

Uddhav07 avatar Sep 08 '24 04:09 Uddhav07

@eugeis @danielmiessler @johnconnor-sec

I made an update video on the new commits that were pushed.

Some changes can be made before bringing this to production. I wanted to commit these as soon as possible to start the feedback loop.

Thanks for the great feedback -- this was a great excuse to learn Go!

noamsiegel avatar Sep 10 '24 04:09 noamsiegel

@noamsiegel Thank you for adaptions.

Did I understood it correct, that the API key is optional?

If yes, we need to adapt here ... client.ApiKey = client.AddSetupQuestion("API Key", true), ... false ...

if check if it is empty don't send the Auth.. header.

eugeis avatar Sep 13 '24 09:09 eugeis

@eugeis I Just pushed the fix to make the Jina API key optional. If blank, it does not set the header token for the request.

noamsiegel avatar Sep 16 '24 18:09 noamsiegel