kubo icon indicating copy to clipboard operation
kubo copied to clipboard

[skip changelog] Optimize dag stat with DP caching

Open sneaxhuh opened this issue 3 weeks ago • 1 comments

Summary

Implements dynamic programming cache for CID traversal in ipfs dag stat to avoid redundant traversals when multiple DAGs share common subgraphs.

Resolves TODO in core/commands/dag/stat.go:17-18

Changes

Core Implementation

  • Added cidStatCache structure to memoize subtree statistics for each CID
  • Replaced linear traversal with recursive DP algorithm that:
    • Checks cache before traversing any node
    • Skips entire subtree traversal for cached CIDs
    • Computes and caches subtree stats (size + block count) for each node
  • Maintains correct accounting for TotalSize, SharedSize, and deduplication ratios

Testing

  • Added TestDagStatCaching with two test cases:
    • Cache consistency when querying duplicate CIDs
    • Correct deduplication stats with shared subgraphs

Performance Impact

Time Complexity:

  • Before: O(N × M) where N = unique CIDs, M = times each appears
  • After: O(N) - each unique CID traversed once

sneaxhuh avatar Dec 02 '25 13:12 sneaxhuh

Triage: Will review. This may be useful for pinner and for dag import, possibly by caching different data. Review should assess additional use cases.

gammazero avatar Dec 02 '25 15:12 gammazero