llama-hub icon indicating copy to clipboard operation
llama-hub copied to clipboard

[Feature Request]: I'd like to specify the appropriate Reader for each file found while using SharePointReader

Open ferdinandosimonetti opened this issue 1 year ago • 1 comments

Feature Description

Hi, actually I'm obtaining my test Documents by scanning a local directory

filename_fn = lambda filename: {"file_name": filename}

DocxReader = download_loader("DocxReader")
PptxReader = download_loader("PptxReader")
PandasExcelReader = download_loader("PandasExcelReader")
PDFReader = download_loader("PDFReader")

mytime("start multiple file types read")
dir_reader = SimpleDirectoryReader(docpath, file_metadata=filename_fn, filename_as_id=True, file_extractor={
  ".docx": DocxReader(),
  ".pptx": PptxReader(),
  ".xlsx": PandasExcelReader(),
  ".pdf": PDFReader()
})
documents = dir_reader.load_data()

but the real documents are stored inside a Sharepoint site and directory (that I, unfortunately, can't test now). I was wondering if there's a way to use SharePointReader while retaining the ability to customize Document id/metadata, as well as the specific Reader for each file format.

Reason

There's no mention in SharePointReader's README of additional parameters like file_extractor, file_metadata, filename_as_id

Value of Feature

Being able to specify a (more) appropriate Reader for each file format could lead to better content interpretation afterwards, I suppose.

ferdinandosimonetti avatar Feb 08 '24 10:02 ferdinandosimonetti

There is a PR for this: https://github.com/run-llama/llama-hub/pull/934

ferdinandosimonetti avatar Feb 09 '24 17:02 ferdinandosimonetti