AutoGPT
AutoGPT copied to clipboard
Stop ignoring small files in split_file
Background
The split_file function ignores small files due to the way it determines whether the last chunk has already been considered in the previous overlap. See #3753.
Changes
Added "and start > 0" to the relevant if statement to only ignore the potentially-overlapping chunk if it is not the first one.
Documentation
As there are no changes to the functionality, merely an edit to help the if condition match the intended purpose, the existing documentation comment should be sufficient.
Test Plan
See #3753 for steps to reproduce the issue.
This change does not seem to have impeded data_integration.py (which brought it to my attention) or the main AutoGPT file-reading functionality, based on about 30 min of testing.
Unless some code is reliant on getting no output from split_data for small functions, which did not seem to be the case in my grep of the repo, this change should not interact negatively with other code.
PR Quality Checklist
- [x] My pull request is atomic and focuses on a single change.
- [ ] I have thoroughly tested my changes with multiple different prompts.
- [x] I have considered potential risks and mitigations for my changes.
- [x] I have documented my changes clearly and comprehensively.
- [x] I have not snuck in any "extra" small tweaks changes
This change does not add any new functionality, and only fixes previously-implemented behavior regarding the case where a file is less than the overlap threshold, so I have not added additional tests.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
docs | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | May 31, 2023 2:11pm |
This is a mass message from the AutoGPT core team. Our apologies for the ongoing delay in processing PRs. This is because we are re-architecting the AutoGPT core!
For more details (and for infor on joining our Discord), please refer to: https://github.com/Significant-Gravitas/Auto-GPT/wiki/Architecting
I think it would be great if we have a test for that case.
Codecov Report
Patch coverage: 100.00
% and project coverage change: +0.08
:tada:
Comparison is base (
63b79a8
) 69.65% compared to head (249d302
) 69.74%.
Additional details and impacted files
@@ Coverage Diff @@
## master #3757 +/- ##
==========================================
+ Coverage 69.65% 69.74% +0.08%
==========================================
Files 72 72
Lines 3523 3523
Branches 562 562
==========================================
+ Hits 2454 2457 +3
+ Misses 881 878 -3
Partials 188 188
Impacted Files | Coverage Δ | |
---|---|---|
autogpt/commands/file_operations.py | 83.52% <100.00%> (ø) |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
You changed AutoGPT's behaviour. The cassettes have been updated and will be merged to the submodule when this Pull Request gets merged.
split_file
is not used anywhere anymore, except in its unit test. I propose to remove the function altogether instead of patching it.
I haven’t opened AutoGPT in a while, but as I recall it is referenced in file_operations.py, which may be relevant in the future for processing local data.
And on the design side, I do feel that data_ingestion functionality (which does use split_file) will be an important functionality once the use of memory is more nuanced (see for instance the recent GPT Minecraft bot Voyager, which uses a memory of “skill” code; it seems useful to manually add elements to a skill/perspective/context/goal library from file data)
as I recall it is referenced in file_operations.py
That is where split_file
is defined, but there is no usage anywhere in the project.
[...] data_ingestion functionality (which does use split_file)
data_ingestion.py imports ingest_file
from file_operations.py, which in turn uses read_file
and MemoryItem.from_text_file
; split_file
is not referenced in this process.
And on the design side, I do feel that data_ingestion functionality (which does use split_file) will be an important functionality once the use of memory is more nuanced (see for instance the recent GPT Minecraft bot Voyager, which uses a memory of “skill” code; it seems useful to manually add elements to a skill/perspective/context/goal library from file data)
That is definitely something we're working on with R&D :)