[ENH]: cleanup HNSW temporary files after registering result
Description of changes
Cleans up temporary HNSW files after registering the result with the sysdb.
A previous iteration of this deleted the files immediately after flushing, but because previous steps may be retried if registration fails we decided this approach was safer.
Will fix compactor nodes running out of disk space.
Test plan
How are these changes tested?
Ran compaction service locally and confirmed that temporary files were deleted.
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?
n/a
Reviewer Checklist
Please leverage this checklist to ensure your code review is thorough before approving
Testing, Bugs, Errors, Logs, Documentation
- [ ] Can you think of any use case in which the code does not behave as intended? Have they been tested?
- [ ] Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
- [ ] If appropriate, are there adequate property based tests?
- [ ] If appropriate, are there adequate unit tests?
- [ ] Should any logging, debugging, tracing information be added or removed?
- [ ] Are error messages user-friendly?
- [ ] Have all documentation changes needed been made?
- [ ] Have all non-obvious changes been commented?
System Compatibility
- [ ] Are there any potential impacts on other parts of the system or backward compatibility?
- [ ] Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?
Quality
- [ ] Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)
Clarifying - The manager could just use an api that deletes all, why does it have to keep track?
On Thu, Aug 8, 2024 at 1:24 PM Max Isom @.***> wrote:
@.**** commented on this pull request.
In rust/worker/src/execution/orchestration/compact.rs https://github.com/chroma-core/chroma/pull/2646#discussion_r1710240569:
@@ -102,6 +102,7 @@ pub struct CompactOrchestrator { Option<tokio::sync::oneshot::Sender<Result<CompactionResponse, Box<dyn ChromaError>>>>, // Current max offset id. curr_max_offset_id: Arc<AtomicU32>,
- hnsw_index_id: Option<Uuid>,
Hammad and I discussed a potentially simpler solution where the compaction manager deletes temporary files after all jobs have been completed. However, at that point the manager needs to know the specific HNSW index IDs to delete, so we would have to add this plumbing anyways.
— Reply to this email directly, view it on GitHub https://github.com/chroma-core/chroma/pull/2646#pullrequestreview-2228793995, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKW32MCJBZLGMDJRXQ7STDZQPHX7AVCNFSM6AAAAABMHF6KQ2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEMRYG44TGOJZGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Clarifying - The manager could just use an api that deletes all, why does it have to keep track?
What knows which IDs are included in "all"? foyer doesn't provide an API to fetch all keys. We could delete the entire parent temporary directory; I was wary of this because current usage seems a bit ambiguous on whether this is allowed to be a top-level system directory (/tmp) (similar issue for deleting all contents of specified directory).
Ah - I presumed the cache would have an affordance to get all keys / iterate or purge itself
On Thu, Aug 8, 2024 at 1:30 PM Max Isom @.***> wrote:
Clarifying - The manager could just use an api that deletes all, why does it have to keep track?
What knows which IDs are included in "all"? foyer doesn't provide an API to fetch all keys. We could delete the entire parent temporary directory; I was wary of this because current usage seems a bit ambiguous on whether this is allowed to be a top-level system directory (/tmp).
— Reply to this email directly, view it on GitHub https://github.com/chroma-core/chroma/pull/2646#issuecomment-2276602468, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKW32LEV36TB5NN7AJF67TZQPIP7AVCNFSM6AAAAABMHF6KQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZWGYYDENBWHA . You are receiving this because you commented.Message ID: @.***>
Ah - I presumed the cache would have an affordance to get all keys / iterate or purge itself
foyer has clean(), but it doesn't return the keys that were removed. Looking at it again though, I think this would work:
while let Some((k, index)) = cache.pop() {
let index_path = ""; // assemble path
std::fs::remove_dir_all(index_path).await;
}
Alternative approach minus some cleanup in 677de8fc, not sure if I like it more.
can we implement drop?