chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[ENH]: cleanup HNSW temporary files after registering result

Open codetheweb opened this issue 1 year ago • 6 comments

Description of changes

Cleans up temporary HNSW files after registering the result with the sysdb.

A previous iteration of this deleted the files immediately after flushing, but because previous steps may be retried if registration fails we decided this approach was safer.

Will fix compactor nodes running out of disk space.

Test plan

How are these changes tested?

Ran compaction service locally and confirmed that temporary files were deleted.

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?

n/a

codetheweb avatar Aug 08 '24 20:08 codetheweb

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • [ ] Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • [ ] Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • [ ] If appropriate, are there adequate property based tests?
  • [ ] If appropriate, are there adequate unit tests?
  • [ ] Should any logging, debugging, tracing information be added or removed?
  • [ ] Are error messages user-friendly?
  • [ ] Have all documentation changes needed been made?
  • [ ] Have all non-obvious changes been commented?

System Compatibility

  • [ ] Are there any potential impacts on other parts of the system or backward compatibility?
  • [ ] Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • [ ] Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

github-actions[bot] avatar Aug 08 '24 20:08 github-actions[bot]

Clarifying - The manager could just use an api that deletes all, why does it have to keep track?

On Thu, Aug 8, 2024 at 1:24 PM Max Isom @.***> wrote:

@.**** commented on this pull request.

In rust/worker/src/execution/orchestration/compact.rs https://github.com/chroma-core/chroma/pull/2646#discussion_r1710240569:

@@ -102,6 +102,7 @@ pub struct CompactOrchestrator { Option<tokio::sync::oneshot::Sender<Result<CompactionResponse, Box<dyn ChromaError>>>>, // Current max offset id. curr_max_offset_id: Arc<AtomicU32>,

  • hnsw_index_id: Option<Uuid>,

Hammad and I discussed a potentially simpler solution where the compaction manager deletes temporary files after all jobs have been completed. However, at that point the manager needs to know the specific HNSW index IDs to delete, so we would have to add this plumbing anyways.

— Reply to this email directly, view it on GitHub https://github.com/chroma-core/chroma/pull/2646#pullrequestreview-2228793995, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKW32MCJBZLGMDJRXQ7STDZQPHX7AVCNFSM6AAAAABMHF6KQ2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEMRYG44TGOJZGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

HammadB avatar Aug 08 '24 20:08 HammadB

Clarifying - The manager could just use an api that deletes all, why does it have to keep track?

What knows which IDs are included in "all"? foyer doesn't provide an API to fetch all keys. We could delete the entire parent temporary directory; I was wary of this because current usage seems a bit ambiguous on whether this is allowed to be a top-level system directory (/tmp) (similar issue for deleting all contents of specified directory).

codetheweb avatar Aug 08 '24 20:08 codetheweb

Ah - I presumed the cache would have an affordance to get all keys / iterate or purge itself

On Thu, Aug 8, 2024 at 1:30 PM Max Isom @.***> wrote:

Clarifying - The manager could just use an api that deletes all, why does it have to keep track?

What knows which IDs are included in "all"? foyer doesn't provide an API to fetch all keys. We could delete the entire parent temporary directory; I was wary of this because current usage seems a bit ambiguous on whether this is allowed to be a top-level system directory (/tmp).

— Reply to this email directly, view it on GitHub https://github.com/chroma-core/chroma/pull/2646#issuecomment-2276602468, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKW32LEV36TB5NN7AJF67TZQPIP7AVCNFSM6AAAAABMHF6KQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZWGYYDENBWHA . You are receiving this because you commented.Message ID: @.***>

HammadB avatar Aug 08 '24 20:08 HammadB

Ah - I presumed the cache would have an affordance to get all keys / iterate or purge itself

foyer has clean(), but it doesn't return the keys that were removed. Looking at it again though, I think this would work:

while let Some((k, index)) = cache.pop() {
  let index_path = ""; // assemble path
  std::fs::remove_dir_all(index_path).await;
}

codetheweb avatar Aug 08 '24 20:08 codetheweb

Alternative approach minus some cleanup in 677de8fc, not sure if I like it more.

codetheweb avatar Aug 08 '24 21:08 codetheweb

can we implement drop?

codetheweb avatar Aug 20 '24 18:08 codetheweb