langchain icon indicating copy to clipboard operation
langchain copied to clipboard

enhancment: optional file loader in `GoogleDriveLoader`

Open MthwRobinson opened this issue 1 year ago • 1 comments

Summary

Gives users the ability to specify a file_loader_cls for processing files in Google Drive that are not Google Documents or Google Sheets.

Fixes #5791. See also this Twitter thread where a user requested this capability.

Testing

For individual files (use a file id from your own GDrive):

from langchain.document_loaders import GoogleDriveLoader
from langchain.document_loaders import UnstructuredFileIOLoader

file_id="1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz"
loader = GoogleDriveLoader(
    file_ids=[file_id],
    file_loader_cls=UnstructuredFileIOLoader,
    file_loader_kwargs={"mode": "elements"}
)
loader.load()

For a folder:

from langchain.document_loaders import GoogleDriveLoader
from langchain.document_loaders import UnstructuredFileIOLoader

folder_id="1asMOHY1BqBS84JcRbOag5LOJac74gpmD"
loader = GoogleDriveLoader(
    folder_id=folder_id,
    file_loader_cls=UnstructuredFileIOLoader,
    file_loader_kwargs={"mode": "elements"}
)
loader.load()

Who can review?

@hwchase17 @eyurtsev

MthwRobinson avatar Jun 09 '23 15:06 MthwRobinson

Thanks for doing this @MthwRobinson - super helpful.

RobSpectre avatar Jun 15 '23 15:06 RobSpectre