[Feature] Google Drive Folder support as a data source
Description
This change allows embedchain users to use a google drive folder as a data source. This is done by building a wrapper around Langchain's GoogleDriveLoader.
Fixes #525
Type of change
Please delete options that are not relevant.
- [x] New feature (non-breaking change which adds functionality)
How Has This Been Tested?
This feature was tested by creating the appropriate unit tests and testing the feature through documentation client code by setting the data_type as "google_drive_folder" It is important to correctly setup the google API credentials in order for this feature to work.
Please delete options that are not relevant.
- Unit Test
Checklist:
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my feature works
- [x] New and existing unit tests pass locally with my changes
- [] Any dependent changes have been merged and published in downstream modules
- [x] I have checked my code and corrected any misspellings
Maintainer Checklist
- [x] closes #525 (Replace xxxx with the GitHub issue number)
- [x] Made sure Checks passed
Thanks @JoeSL for the PR. This is going to be really useful for the community.
Can you please do the following before we can review the PR:
- Resolve conflicts
- Rename the files, classes and other things from
Google Drive FoldertoGoogle Drive? We want to keep the effort low for the users when adding data sources. So,google_drive_folderwill becomegoogle_drive,GoogleDriveFolderChunkerwill becomeGoogleDriveChunkerand so on.
Thanks again for this PR.
Hello @deshraj, Kindly find the PR again with the requested changes.
Codecov Report
Attention: 18 lines in your changes are missing coverage. Please review.
Comparison is base (
ae2e9cb) 57.81% compared to head (2b00e3c) 57.89%. Report is 3 commits behind head on main.
| Files | Patch % | Lines |
|---|---|---|
| embedchain/loaders/google_drive.py | 50.00% | 16 Missing :warning: |
| embedchain/utils.py | 71.42% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #1106 +/- ##
==========================================
+ Coverage 57.81% 57.89% +0.07%
==========================================
Files 131 133 +2
Lines 5149 5201 +52
==========================================
+ Hits 2977 3011 +34
- Misses 2172 2190 +18
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Looks good. Thanks for adding this feature.
The pleasure is mine. Embedchain is a great framework! Please let me know if you have any other issues I can help with.