canopy icon indicating copy to clipboard operation
canopy copied to clipboard

[Feature] :grey_question: Support for html/pdf pages and YT videos ?

Open adriens opened this issue 2 years ago • 5 comments

Is this your first time submitting a feature request?

  • [X] I have searched the existing issues, and I could not find an existing issue for this feature
  • [X] I am requesting a straightforward extension of existing functionality

Describe the feature

Hi, I see in the README that canopy actually supports local files likes parquet, is there nay kind of DocumentLoader for online contents like:

  • YT Video
  • Html page

... and does it support local pdf import ?

Describe alternatives you've considered

  • Scrap web content into text files then load them
  • For YT : build transcript, then load the txt output

Who will this benefit?

I gues almost any lazy people wanting to give a try and use canopy

Are you interested in contributing this feature?

No response

Anything else?

No response

adriens avatar Nov 09 '23 20:11 adriens

Would it be possible to use an existing Pinecone index for Canopy? I built it from PDF files using llama-index; and also the free version of Pinecone does not support more than one index - so it would be great to re-use the existing one...

UPD: Yes, I found in the docs that this is not possible. Although beforehand I have tried to bypass this and first created a Pinecone index through Canopy and then filled it in externally - the server was set up and running without issues, however, when I tried to chat, an error was thrown. So I converted all .pdf in the directory to .txt files and upserted them through Canopy.

strelkon avatar Nov 13 '23 10:11 strelkon

PDF is a must have

pashpashpash avatar Dec 08 '23 18:12 pashpashpash

Yes, please include PDF in Canopy

NB-123 avatar Jan 08 '24 12:01 NB-123

@adriens @NB-123 @pashpashpash @strelkon Check out this library: PineconePDFExtractor it will accept a list of pdf files and convert them back to your desired format so that you can use pinecone-canopy's upsert data into this endpoint: v1/context/upsert on FastAPI

kowshik24 avatar Jan 20 '24 23:01 kowshik24

Thanks a lot for pointing this library @kowshik24 :pray:

adriens avatar Jan 21 '24 05:01 adriens