argo-cd icon indicating copy to clipboard operation
argo-cd copied to clipboard

fix: cross namespace hierarchy traversal from cluster-scoped parents to namespaced children (fixes #24379)

Open jcogilvie opened this issue 4 months ago • 3 comments

Fixes #24379. Fixes #7733 Fixes #8222

This PR adds resource-tree (including UI) support for cluster-scoped parents owning namespaced children. This is a valid kube owner relationship which is sometimes manifested in projects like Crossplane, but in the current version we don't actually capture the children in the resource tree.

Checklist:

  • [x] Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • [x] The title of the PR states what changed and the related issues number (used for the release note).
  • [x] The title of the PR conforms to the Title of the PR
  • [x] I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • [x] I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • [x] Does this PR require documentation updates?
  • [x] I've updated documentation as required by this PR.
  • [x] I have signed off all my commits as required by DCO
  • [x] I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • [x] My build is green (troubleshooting builds).
  • [x] My new feature complies with the feature status guidelines.
  • [x] I have added a brief description of why this PR is necessary and/or what this PR solves.
  • [ ] Optional. My organization is added to USERS.md.
  • [ ] Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

jcogilvie avatar Oct 04 '25 17:10 jcogilvie

:x: Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • :rocket: /bns:deploy to deploy the environment

bunnyshell[bot] avatar Oct 04 '25 17:10 bunnyshell[bot]

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :warning: Please upload report for BASE (master@a2659e9). Learn more about missing BASE report. :warning: Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #24847   +/-   ##
=========================================
  Coverage          ?   62.56%           
=========================================
  Files             ?      353           
  Lines             ?    50087           
  Branches          ?        0           
=========================================
  Hits              ?    31337           
  Misses            ?    15731           
  Partials          ?     3019           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 05 '25 23:10 codecov[bot]

Cross-Namespace Hierarchy Traversal

Overview

This branch adds support for traversing resource hierarchies from cluster-scoped parents to their namespaced children across multiple namespaces. The implementation uses a parent-to-children index (parentUIDToChildren) that maps resource UIDs to their direct children, enabling O(1) lookups during hierarchy traversal.

Scope: This feature specifically handles traversing from cluster-scoped parents (e.g., Crossplane Provider) to their namespaced children (e.g., Deployment, Service) across multiple namespaces. Regular within-namespace hierarchy traversal continues to use graph-based traversal.

Implementation

Data Structure

// parentUIDToChildren maps a resource's UID to all its direct children
// Maintained for ALL resources that have owner references
// Used during cross-namespace hierarchy traversal for O(1) lookups
parentUIDToChildren map[types.UID][]kube.ResourceKey

Cache Lifecycle Integration

The index is maintained alongside existing cache structures and follows the same lifecycle:

Incremental Updates (Continuous)

During normal operation, Kubernetes watch events trigger incremental cache updates:

  • setNode(): When a resource is added or updated, the index is updated to reflect new owner references
  • onNodeRemoved(): When a resource is deleted, it is removed from all parent entries in the index

Full Sync (Rare)

During cluster invalidation (Invalidate() or initial sync):

  • All indexes are cleared, including parentUIDToChildren
  • Resources are reloaded from Kubernetes
  • rebuildOrphanedChildrenIndex(): Rebuilds the index by scanning all resources and populating parent-child relationships

Normal Refresh (Frequent, ~3 min)

Standard ArgoCD refresh operations do not touch the cache at all - they only re-run health/sync assessments on existing cached data.

Index Maintenance Functions

addToParentUIDToChildren(parentUID, childKey)

  • Adds a child to a parent's entry in the index
  • Prevents duplicates
  • Called during setNode() and rebuildOrphanedChildrenIndex()

removeFromParentUIDToChildren(parentUID, childKey)

  • Removes a child from a parent's entry
  • Cleans up empty entries
  • Called during onNodeRemoved() and updateParentUIDToChildren()

updateParentUIDToChildren(childKey, oldResource, newResource)

  • Handles owner reference changes when a resource is updated
  • Compares old and new owner refs
  • Removes from old parents, adds to new parents
  • Called during setNode() when updating existing resources

Traversal Algorithm

processCrossNamespaceChildren() For each cluster-scoped parent:

  1. Get parent's UID
  2. Look up direct children via parentUIDToChildren[parentUID] (O(1))
  3. Process each child with the action callback
  4. Recursively traverse children via iterateChildrenUsingIndex()

iterateChildrenUsingIndex(parent, nsNodes, visited, action) Recursive traversal using the index:

  1. Look up parent's direct children via parentUIDToChildren[parent.UID]
  2. For each child in the same namespace:
    • Skip if already visited
    • Execute action callback
    • Recursively process child's descendants

Design Rationale

The index-based approach was chosen over graph-based traversal because:

Graph-based approach (considered but not implemented):

  • Would require building temporary parent-child graphs for each affected namespace
  • O(n × m) complexity where n = affected namespaces, m = resources per namespace
  • Example: 10 affected namespaces × 100 resources = 1,000 operations per traversal
  • High memory allocation for temporary graph structures
  • Scales with percentage of namespaces containing cross-namespace resources (primary dimension)

Index-based approach (current implementation):

  • Maintains a persistent index during normal cache operations
  • O(k) complexity where k = resources with cross-namespace parents
  • Example: 10 affected resources × O(1) lookup = 10 operations per traversal
  • Minimal memory overhead from persistent index
  • Scales with total count of cross-namespace resources (namespace distribution irrelevant)

Scaling Characteristics Comparison

The index-based approach fundamentally changes what dimensions drive performance:

Dimension Graph-Based Index-Based Impact
% of namespaces with cross-NS resources Primary driver (3.5-64x overhead for 4-100%) Minimal impact Index eliminates per-namespace graph building
Total cross-NS resource count Secondary (2.31x for 50 vs 10 resources in 1 namespace) Primary driver (linear scaling) O(1) lookup per resource
Namespace distribution Critical (1.02x for 1 namespace, 11.5x for 10 namespaces) Irrelevant No per-namespace processing
Resources per namespace Important (graph size = namespace size) Minimal impact Only traverse relevant resources

Key insight: The index-based approach makes the feature's performance predictable based solely on the total number of cross-namespace resources in the cluster, regardless of how they're distributed across namespaces. This is a significant improvement over graph-based approaches where namespace distribution is the primary cost driver.

Performance Characteristics

Benchmark Results

Testing with 50 namespaces, 100 resources per namespace, varying percentages of cross-namespace resources:

Scenario Time (μs/op) Memory (B/op) Allocs/op
0% cross-namespace 5.54 6,328 105
2% cross-namespace 5.67 6,328 105
4% cross-namespace 5.67 6,328 105
10% cross-namespace 6.06 6,328 105
20% cross-namespace 7.16 7,648 108
100% cross-namespace 14.46 15,264 112

Performance Scaling

The implementation scales with the total number of cross-namespace resources, not the percentage of namespaces containing them:

Cross-namespace % Time (μs) Overhead vs 0% Total Cross-NS Resources
0% 5.54 baseline 0
2% 5.67 +2.3% 10 (1 namespace)
4% 5.67 +2.3% 20 (2 namespaces)
10% 6.06 +9.4% 50 (5 namespaces)
20% 7.16 +29.2% 100 (10 namespaces)
100% 14.46 +161% 500 (50 namespaces)

Key difference from graph-based approaches: The index-based implementation scales with the total count of cross-namespace resources, NOT with the percentage of namespaces containing them. This is because:

  • Index lookups are O(1) per resource
  • No per-namespace graph building
  • Cost is proportional to total resources traversed, regardless of namespace distribution

This means performance is largely independent of namespace distribution - whether 10 cross-namespace resources are concentrated in 1 namespace or spread across 10 namespaces makes minimal difference.

Memory Efficiency

Memory usage remains efficient across workload types:

Cross-namespace % Memory (B/op) Growth vs 0%
0-10% 6,328 0%
20% 7,648 +21%
100% 15,264 +141%

Memory usage grows modestly because the index is maintained persistently - there's no repeated allocation of temporary graph structures during traversal.

Complexity Analysis

Time Complexity: O(k) where k = number of resources with cross-namespace parents

  • Each resource is visited once via O(1) index lookup
  • No graph construction needed during traversal

Space Complexity: O(r) where r = total resources in cluster

  • Index entry for each resource with owner references
  • ~64 bytes per parent entry + ~48 bytes per child ResourceKey
  • For 5,000 resources with 10% having owner refs: ~32 KB overhead

Real-World Impact

Typical Crossplane Workload (1-2% cross-namespace)

  • Time: ~5.7 μs per traversal
  • Impact: Minimal overhead vs baseline (2-3%)
  • Use case: Crossplane Providers creating namespaced Deployments and Services. Currently, ArgoCD UI shows only the cluster-scoped CRDs generated by Providers; this feature enables displaying the namespaced Deployments and Services as well.

Multi-Provider Setup (5-10% cross-namespace)

  • Time: ~6 μs per traversal
  • Impact: 9% overhead vs baseline
  • Use case: Multiple Crossplane Providers, each generating namespaced resources across several namespaces

Heavy Cross-Namespace Usage (20% cross-namespace)

  • Time: ~7 μs per traversal
  • Impact: 29% overhead vs baseline
  • Use case: Extensive cross-namespace composition patterns

Extreme Cross-Namespace (100% cross-namespace)

  • Time: ~14 μs per traversal
  • Impact: 161% overhead vs baseline
  • Use case: Stress test - all resources have cluster-scoped parents

All overhead figures are still in the microsecond range and represent acceptable performance for the added functionality.

Operational Characteristics

Memory Overhead

The parentUIDToChildren index adds memory overhead proportional to the number of owner references in the cluster:

Memory per entry = ~64 bytes (map entry) + ~48 bytes per child (ResourceKey)

For a typical cluster with 5,000 resources and 10% having owner refs:

500 resources with owners × ~64 bytes = ~32 KB overhead

This is negligible in the context of ArgoCD's overall memory footprint.

CPU Impact

Index maintenance: Near-zero overhead

  • Updates happen during existing setNode() and onNodeRemoved() operations
  • Simple map operations (add/remove/lookup) are O(1)
  • No additional synchronization needed (protected by existing cache lock)

Traversal: Efficient O(k) traversal

  • Direct index lookups replace graph construction
  • Sublinear scaling with cross-namespace resource percentage

Scalability

The implementation scales well with cluster size:

Namespace scaling: Performance is independent of total namespace count

  • Only processes namespaces with relevant resources
  • Index lookup is O(1) regardless of cluster size

Resource scaling: Linear with cross-namespace resources

  • Each cross-namespace resource adds one index entry
  • Traversal time proportional to number of cross-namespace relationships

Testing

Unit Test Coverage

All cross-namespace hierarchy tests pass:

  • TestIterateHierarchyV2_ClusterScopedParent_FindsAllChildren
  • TestIterateHierarchyV2_ClusterScopedParentOnly_InferredUID
  • TestIterateHierarchyV2_DisabledClusterScopedParents
  • TestOrphanedChildrenCleanup
  • TestOrphanedChildrenIndex_OwnerRefLifecycle
  • TestIterateHierarchyV2_NoDuplicatesCrossNamespace

Benchmark Scenarios

Benchmarks test various real-world patterns:

  • Baseline (0%): Pure namespace-scoped resources
  • Low (2%): Typical Crossplane operator workload
  • Medium (10%): Multi-tenant with RBAC policies
  • High (20%): Heavy cross-namespace composition patterns
  • Extreme (100%): Stress test with all resources having cluster-scoped parents
  • Variation tests: Different numbers of cross-namespace references per parent (10, 25, 50)
  • Scale tests: 50 namespaces vs 100 namespaces

Code Location

File: gitops-engine/pkg/cache/cluster.go

Key sections:

  • Lines 312-318: parentUIDToChildren field definition
  • Lines 689-751: Index maintenance helper functions
  • Lines 531-535: Index population in setNode()
  • Lines 1779-1784: Index cleanup in onNodeRemoved()
  • Lines 562-586: Index rebuild in rebuildOrphanedChildrenIndex()
  • Lines 1427-1497: processCrossNamespaceChildren() using index-based traversal
  • Lines 1464-1497: iterateChildrenUsingIndex() recursive traversal

Summary

This implementation adds cross-namespace hierarchy traversal to ArgoCD by:

✅ Maintaining a persistent parent-to-children index for O(1) lookups ✅ Achieving efficient O(k) traversal complexity ✅ Keeping memory overhead minimal (~32 KB for typical clusters) ✅ Integrating cleanly with existing cache lifecycle ✅ Scaling sublinearly with cross-namespace resource percentage ✅ Maintaining microsecond-level performance even under stress scenarios

The index-based approach was chosen over graph-based traversal to avoid O(n×m) complexity and repeated memory allocations during traversal operations.

jcogilvie avatar Oct 09 '25 22:10 jcogilvie

@jcogilvie can you rebase? That should clear some flaky test failures.

crenshaw-dev avatar Dec 14 '25 22:12 crenshaw-dev