kubo
kubo copied to clipboard
[skip changelog] Optimize dag stat with DP caching
Summary
Implements dynamic programming cache for CID traversal in ipfs dag stat to avoid redundant traversals when multiple DAGs share common subgraphs.
Resolves TODO in core/commands/dag/stat.go:17-18
Changes
Core Implementation
- Added
cidStatCachestructure to memoize subtree statistics for each CID - Replaced linear traversal with recursive DP algorithm that:
- Checks cache before traversing any node
- Skips entire subtree traversal for cached CIDs
- Computes and caches subtree stats (size + block count) for each node
- Maintains correct accounting for TotalSize, SharedSize, and deduplication ratios
Testing
- Added
TestDagStatCachingwith two test cases:- Cache consistency when querying duplicate CIDs
- Correct deduplication stats with shared subgraphs
Performance Impact
Time Complexity:
- Before: O(N × M) where N = unique CIDs, M = times each appears
- After: O(N) - each unique CID traversed once
Triage: Will review. This may be useful for pinner and for dag import, possibly by caching different data. Review should assess additional use cases.