langchain
langchain copied to clipboard
Add multi file patterns globbing for DirectoryLoader()
Add multi file patterns globbing for DirectoryLoader()
This PR replaces the old glob
arg with a new arg file_pattern: Optional[set] = None
that specifies the file pattern(s) you want to glob. E.g. {".pdf"}
or {".pdf", ".docx"}
, etc.
Or, if you want to load all files in the directory, can simply leave out the arg.
The globbing is done with Path.glob(), or Path.rglob(), as per before. The added algorithm allows for globbing to be done once, and not glob as many times as number of patterns. Resulting in fast performance.
@hwchase17 @eyurtsev
Linking to original issue
@marcusyatim thank for helping out with the feature request! I outlined a few places where changes are required before we can merge in.
stale