llama-hub
llama-hub copied to clipboard
[Feature Request]: I'd like to specify the appropriate Reader for each file found while using SharePointReader
Feature Description
Hi, actually I'm obtaining my test Documents by scanning a local directory
filename_fn = lambda filename: {"file_name": filename}
DocxReader = download_loader("DocxReader")
PptxReader = download_loader("PptxReader")
PandasExcelReader = download_loader("PandasExcelReader")
PDFReader = download_loader("PDFReader")
mytime("start multiple file types read")
dir_reader = SimpleDirectoryReader(docpath, file_metadata=filename_fn, filename_as_id=True, file_extractor={
".docx": DocxReader(),
".pptx": PptxReader(),
".xlsx": PandasExcelReader(),
".pdf": PDFReader()
})
documents = dir_reader.load_data()
but the real documents are stored inside a Sharepoint site and directory (that I, unfortunately, can't test now). I was wondering if there's a way to use SharePointReader while retaining the ability to customize Document id/metadata, as well as the specific Reader for each file format.
Reason
There's no mention in SharePointReader's README of additional parameters like file_extractor, file_metadata, filename_as_id
Value of Feature
Being able to specify a (more) appropriate Reader for each file format could lead to better content interpretation afterwards, I suppose.
There is a PR for this: https://github.com/run-llama/llama-hub/pull/934