langchain
langchain copied to clipboard
enhancment: optional file loader in `GoogleDriveLoader`
Summary
Gives users the ability to specify a file_loader_cls
for processing files in Google Drive that are not Google Documents or Google Sheets.
Fixes #5791. See also this Twitter thread where a user requested this capability.
Testing
For individual files (use a file id from your own GDrive):
from langchain.document_loaders import GoogleDriveLoader
from langchain.document_loaders import UnstructuredFileIOLoader
file_id="1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz"
loader = GoogleDriveLoader(
file_ids=[file_id],
file_loader_cls=UnstructuredFileIOLoader,
file_loader_kwargs={"mode": "elements"}
)
loader.load()
For a folder:
from langchain.document_loaders import GoogleDriveLoader
from langchain.document_loaders import UnstructuredFileIOLoader
folder_id="1asMOHY1BqBS84JcRbOag5LOJac74gpmD"
loader = GoogleDriveLoader(
folder_id=folder_id,
file_loader_cls=UnstructuredFileIOLoader,
file_loader_kwargs={"mode": "elements"}
)
loader.load()
Who can review?
@hwchase17 @eyurtsev
Thanks for doing this @MthwRobinson - super helpful.