Oxen icon indicating copy to clipboard operation
Oxen copied to clipboard

Refactor CommitMerkleTree and repositories::tree

Open rpschoenburg opened this issue 1 month ago • 2 comments

Refactor the merkle tree loading modules in CommitMerkleTree and repositories tree to deduplicate code and implement a consistent naming scheme. Functions that previously got but did not require a CommitMerkleTree object now instead get a root from repositories::tree. In general, all functions go through repositories::tree rather than directly accessing CommitMerkleTree where possible

Summary by CodeRabbit

  • Bug Fixes

    • Push command now writes errors to stderr.
    • Improved error messaging when repository trees are missing.
  • Improvements

    • File restoration now also verifies file size (in addition to modification time) to avoid incorrect restores.
    • Refactored tree/directory traversal and subtree loading for more consistent and reliable behavior.

rpschoenburg avatar Nov 06 '25 10:11 rpschoenburg

Walkthrough

Refactors Merkle-tree access: replaces direct CommitMerkleTree usage with centralized repositories::tree APIs, adds depth/partial-node/hash-collection surfaces (HashSet-based), propagates new optional dir_hashes parameter to many call sites, and surfaces file size in partial-node/restore logic.

Changes

Cohort / File(s) Change Summary
CLI command callers
oxen-rust/src/cli/src/cmd/ls.rs, oxen-rust/src/cli/src/cmd/node.rs, oxen-rust/src/cli/src/cmd/tree.rs
Replaced direct CommitMerkleTree calls with repositories::tree helpers; updated call sites to new signatures (added extra argument like None); added refactor comments and removed unused imports.
Repository tree API (core)
oxen-rust/src/lib/src/repositories/tree.rs
Major: added public APIs for root/subtree loading with hash collection and partial nodes (e.g., get_root_with_children_and_node_hashes, get_root_with_children_and_partial_nodes, get_subtree_by_depth*, get_ancestor_nodes, list_missing_node_hashes); migrated internal tracking from HashMap to HashSet; exposed PartialNode.
Commit Merkle tree constructors / loaders
oxen-rust/src/lib/src/core/v_latest/index/commit_merkle_tree.rs
Major: added constructors/loaders (from_commit, from_path, read_from_path*, read_node_and_collect_*, depth/path-aware loaders) and updated root/dir methods to accept hash-collection parameters and optional dir_hashes.
Call-site migrations (many modules)
oxen-rust/src/lib/src/core/v_latest/*, oxen-rust/src/lib/src/repositories/*, oxen-rust/src/lib/src/api/*, oxen-rust/src/cli/src/cmd/*, oxen-rust/src/lib/src/util/fs.rs, ...
Updated widespread call sites to use the new repositories::tree APIs and signatures (pass None for new optional params where appropriate); replaced uses of CommitMerkleTree helpers with repository helpers; adapted to Option returns and added explicit "Merkle tree not found" errors in some places.
Hash-tracking & fetch/push changes
oxen-rust/src/lib/src/core/v_latest/fetch.rs, oxen-rust/src/lib/src/core/v_latest/push.rs
Switched internal hash collections to HashSet and refactored subtree/hash-collection flows; adapted per-path subtree traversal and unique/shared hash aggregation.
PartialNode & metadata
oxen-rust/src/lib/src/model/partial_node.rs
Added pub size: u64 to PartialNode and updated constructor to accept and set size.
Restore/merge/branches/index logic
oxen-rust/src/lib/src/core/v_latest/restore.rs, branches.rs, merge.rs, df.rs
Refactored to use repository tree loaders (e.g., get_node_by_path_with_children, get_root_with_children_and_partial_nodes); added size-aware checks for restoration decisions; updated error messages and unwrap/Option handling.
Workspace / data-frame callers
oxen-rust/src/lib/src/core/v_latest/workspaces/*, oxen-rust/src/lib/src/api/client/workspaces/files.rs, oxen-rust/src/lib/src/repositories/workspaces/data_frames.rs
Replaced CommitMerkleTree::from_path usages with repositories::tree::get_node_by_path_with_children and adjusted flow to handle Option results; updated get_dir_with_children call sites to pass new param.
Download / entries / rm / util
oxen-rust/src/lib/src/core/v_latest/download.rs, entries.rs, rm.rs, oxen-rust/src/lib/src/core/v_old/v0_19_0/entries.rs, oxen-rust/src/lib/src/util/fs.rs
Added extra None argument to get_dir_with_children*/get_dir_without_children/get_dir_with_children_recursive call sites; preserved behavior.
Commit writer / remote-mode checkout & restore
oxen-rust/src/lib/src/repositories/commits/commit_writer.rs, oxen-rust/src/lib/src/repositories/remote_mode/checkout.rs, .../restore.rs
Use repositories::tree helpers for listing and root construction; populate partial-node maps; adjust ancestor-node computations and storage flows.
Small cleanups
oxen-rust/src/lib/src/repositories.rs, oxen-rust/src/lib/src/api/client/tree.rs
Removed unused imports; minor comment/whitespace tweaks.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller as Caller (cmd/core/etc.)
    participant ReposTree as repositories::tree
    participant OldCMT as CommitMerkleTree (impl)
    participant Store as Merkle storage

    rect rgb(220, 240, 255)
    Note over Caller,OldCMT: Old flow (many callers)
    Caller->>OldCMT: CommitMerkleTree::from_path(repo, commit, path, load_recursive)
    OldCMT->>Store: read nodes/hashes
    Store-->>OldCMT: Merkle nodes
    OldCMT-->>Caller: MerkleTree wrapper
    end

    rect rgb(240, 220, 255)
    Note over Caller,ReposTree: New flow (centralized)
    Caller->>ReposTree: get_node_by_path_with_children(repo, commit, path, [opts])
    ReposTree->>OldCMT: (internal) read_from_path_maybe / read_node_*
    OldCMT->>Store: read nodes/hashes (depth/partial)
    Store-->>OldCMT: Merkle nodes
    OldCMT-->>ReposTree: Node(s), collected hashes/partials
    ReposTree-->>Caller: Option<Node> + updated hash/partial collections
    end

    rect rgb(220, 255, 230)
    Note over Caller,ReposTree: Partial/hash collection variant
    Caller->>ReposTree: get_root_with_children_and_partial_nodes(repo, commit, base?, unique?, shared?, partial_nodes)
    ReposTree->>Store: read selective nodes (depth/path)
    Store-->>ReposTree: PartialNode entries (hash, mtime, size)
    ReposTree-->>Caller: Root + partial_nodes map + hash sets
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas to focus review on:

  • repositories/tree.rs (new public APIs, HashSet vs HashMap changes, option/unwrap semantics)
  • core/v_latest/index/commit_merkle_tree.rs (many new constructors/readers; depth/partial logic)
  • branches.rs, merge.rs, restore.rs (size-aware restore decisions and merge flows)
  • fetch.rs / push.rs (hash collection migration and per-path aggregation)
  • Multiple call-site updates passing new optional params — verify correct None/Some use and error conversions

Possibly related PRs

  • Oxen-AI/Oxen#652 — closely related: adds/propagates dir_hashes parameter and caller refactors to repositories::tree.
  • Oxen-AI/Oxen#630 — related: partial/depth-based subtree loads and hash/partial-node collection changes.
  • Oxen-AI/Oxen#596 — related: subtree checkout/restore refactors and updated Merkle-tree access patterns.

Suggested reviewers

  • jcelliott
  • subygan

Poem

🐰
Through branches, hashes, depth and root,
I hopped through nodes in tiny suit.
Partial sizes counted true,
Central trees now guide us through—
A carrot-coded refactor, new! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.30% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective of this changeset - a refactoring of CommitMerkleTree and repositories::tree modules to deduplicate code and route functions through a centralized API.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch API-8/repositories_tree_refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 06 '25 10:11 coderabbitai[bot]

Notes on testing:

This PR isn't very specific to any workflow, so I'm not sure there's a particular way it needs to be tested. I think the biggest specific changes were probably with oxen push, and with repos cloned with subtree paths, so I'd recommend doing clone/push/pull with that at least once.

rpschoenburg avatar Nov 06 '25 19:11 rpschoenburg