Refactor CommitMerkleTree and repositories::tree
Refactor the merkle tree loading modules in CommitMerkleTree and repositories tree to deduplicate code and implement a consistent naming scheme. Functions that previously got but did not require a CommitMerkleTree object now instead get a root from repositories::tree. In general, all functions go through repositories::tree rather than directly accessing CommitMerkleTree where possible
Summary by CodeRabbit
-
Bug Fixes
- Push command now writes errors to stderr.
- Improved error messaging when repository trees are missing.
-
Improvements
- File restoration now also verifies file size (in addition to modification time) to avoid incorrect restores.
- Refactored tree/directory traversal and subtree loading for more consistent and reliable behavior.
Walkthrough
Refactors Merkle-tree access: replaces direct CommitMerkleTree usage with centralized repositories::tree APIs, adds depth/partial-node/hash-collection surfaces (HashSet-based), propagates new optional dir_hashes parameter to many call sites, and surfaces file size in partial-node/restore logic.
Changes
| Cohort / File(s) | Change Summary |
|---|---|
CLI command callers oxen-rust/src/cli/src/cmd/ls.rs, oxen-rust/src/cli/src/cmd/node.rs, oxen-rust/src/cli/src/cmd/tree.rs |
Replaced direct CommitMerkleTree calls with repositories::tree helpers; updated call sites to new signatures (added extra argument like None); added refactor comments and removed unused imports. |
Repository tree API (core) oxen-rust/src/lib/src/repositories/tree.rs |
Major: added public APIs for root/subtree loading with hash collection and partial nodes (e.g., get_root_with_children_and_node_hashes, get_root_with_children_and_partial_nodes, get_subtree_by_depth*, get_ancestor_nodes, list_missing_node_hashes); migrated internal tracking from HashMap to HashSet; exposed PartialNode. |
Commit Merkle tree constructors / loaders oxen-rust/src/lib/src/core/v_latest/index/commit_merkle_tree.rs |
Major: added constructors/loaders (from_commit, from_path, read_from_path*, read_node_and_collect_*, depth/path-aware loaders) and updated root/dir methods to accept hash-collection parameters and optional dir_hashes. |
Call-site migrations (many modules) oxen-rust/src/lib/src/core/v_latest/*, oxen-rust/src/lib/src/repositories/*, oxen-rust/src/lib/src/api/*, oxen-rust/src/cli/src/cmd/*, oxen-rust/src/lib/src/util/fs.rs, ... |
Updated widespread call sites to use the new repositories::tree APIs and signatures (pass None for new optional params where appropriate); replaced uses of CommitMerkleTree helpers with repository helpers; adapted to Option returns and added explicit "Merkle tree not found" errors in some places. |
Hash-tracking & fetch/push changes oxen-rust/src/lib/src/core/v_latest/fetch.rs, oxen-rust/src/lib/src/core/v_latest/push.rs |
Switched internal hash collections to HashSet and refactored subtree/hash-collection flows; adapted per-path subtree traversal and unique/shared hash aggregation. |
PartialNode & metadata oxen-rust/src/lib/src/model/partial_node.rs |
Added pub size: u64 to PartialNode and updated constructor to accept and set size. |
Restore/merge/branches/index logic oxen-rust/src/lib/src/core/v_latest/restore.rs, branches.rs, merge.rs, df.rs |
Refactored to use repository tree loaders (e.g., get_node_by_path_with_children, get_root_with_children_and_partial_nodes); added size-aware checks for restoration decisions; updated error messages and unwrap/Option handling. |
Workspace / data-frame callers oxen-rust/src/lib/src/core/v_latest/workspaces/*, oxen-rust/src/lib/src/api/client/workspaces/files.rs, oxen-rust/src/lib/src/repositories/workspaces/data_frames.rs |
Replaced CommitMerkleTree::from_path usages with repositories::tree::get_node_by_path_with_children and adjusted flow to handle Option results; updated get_dir_with_children call sites to pass new param. |
Download / entries / rm / util oxen-rust/src/lib/src/core/v_latest/download.rs, entries.rs, rm.rs, oxen-rust/src/lib/src/core/v_old/v0_19_0/entries.rs, oxen-rust/src/lib/src/util/fs.rs |
Added extra None argument to get_dir_with_children*/get_dir_without_children/get_dir_with_children_recursive call sites; preserved behavior. |
Commit writer / remote-mode checkout & restore oxen-rust/src/lib/src/repositories/commits/commit_writer.rs, oxen-rust/src/lib/src/repositories/remote_mode/checkout.rs, .../restore.rs |
Use repositories::tree helpers for listing and root construction; populate partial-node maps; adjust ancestor-node computations and storage flows. |
Small cleanups oxen-rust/src/lib/src/repositories.rs, oxen-rust/src/lib/src/api/client/tree.rs |
Removed unused imports; minor comment/whitespace tweaks. |
Sequence Diagram(s)
sequenceDiagram
autonumber
participant Caller as Caller (cmd/core/etc.)
participant ReposTree as repositories::tree
participant OldCMT as CommitMerkleTree (impl)
participant Store as Merkle storage
rect rgb(220, 240, 255)
Note over Caller,OldCMT: Old flow (many callers)
Caller->>OldCMT: CommitMerkleTree::from_path(repo, commit, path, load_recursive)
OldCMT->>Store: read nodes/hashes
Store-->>OldCMT: Merkle nodes
OldCMT-->>Caller: MerkleTree wrapper
end
rect rgb(240, 220, 255)
Note over Caller,ReposTree: New flow (centralized)
Caller->>ReposTree: get_node_by_path_with_children(repo, commit, path, [opts])
ReposTree->>OldCMT: (internal) read_from_path_maybe / read_node_*
OldCMT->>Store: read nodes/hashes (depth/partial)
Store-->>OldCMT: Merkle nodes
OldCMT-->>ReposTree: Node(s), collected hashes/partials
ReposTree-->>Caller: Option<Node> + updated hash/partial collections
end
rect rgb(220, 255, 230)
Note over Caller,ReposTree: Partial/hash collection variant
Caller->>ReposTree: get_root_with_children_and_partial_nodes(repo, commit, base?, unique?, shared?, partial_nodes)
ReposTree->>Store: read selective nodes (depth/path)
Store-->>ReposTree: PartialNode entries (hash, mtime, size)
ReposTree-->>Caller: Root + partial_nodes map + hash sets
end
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~60 minutes
Areas to focus review on:
- repositories/tree.rs (new public APIs, HashSet vs HashMap changes, option/unwrap semantics)
- core/v_latest/index/commit_merkle_tree.rs (many new constructors/readers; depth/partial logic)
- branches.rs, merge.rs, restore.rs (size-aware restore decisions and merge flows)
- fetch.rs / push.rs (hash collection migration and per-path aggregation)
- Multiple call-site updates passing new optional params — verify correct
None/Someuse and error conversions
Possibly related PRs
- Oxen-AI/Oxen#652 — closely related: adds/propagates
dir_hashesparameter and caller refactors torepositories::tree. - Oxen-AI/Oxen#630 — related: partial/depth-based subtree loads and hash/partial-node collection changes.
- Oxen-AI/Oxen#596 — related: subtree checkout/restore refactors and updated Merkle-tree access patterns.
Suggested reviewers
- jcelliott
- subygan
Poem
🐰
Through branches, hashes, depth and root,
I hopped through nodes in tiny suit.
Partial sizes counted true,
Central trees now guide us through—
A carrot-coded refactor, new! 🥕
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 9.30% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title accurately summarizes the main objective of this changeset - a refactoring of CommitMerkleTree and repositories::tree modules to deduplicate code and route functions through a centralized API. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
API-8/repositories_tree_refactor
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Notes on testing:
This PR isn't very specific to any workflow, so I'm not sure there's a particular way it needs to be tested. I think the biggest specific changes were probably with oxen push, and with repos cloned with subtree paths, so I'd recommend doing clone/push/pull with that at least once.