# Question about Vizier CR Design: One Vizier per DaemonSet vs Full Namespace Deployment

Open Pger-Y opened this issue 6 months ago • 0 comments

Question about Vizier CR Design: One Vizier per DaemonSet vs Full Namespace Deployment

Summary

I have questions about the design philosophy behind Pixie's Vizier Custom Resource and would like to understand the rationale for the current architecture, as well as any future plans for supporting multiple PEM configurations within a single cluster.

Current Behavior vs Expected Behavior

What I Expected

One Vizier CR → One PEM DaemonSet with specific node targeting
Ability to create multiple Vizier CRs in the same namespace for different node types
Each Vizier CR managing only its corresponding PEM DaemonSet

What Actually Happens

One Vizier CR → Full namespace deployment (vizier-cloud-connector, vizier-metadata, vizier-query-broker, AND vizier-pem)
Multiple Vizier CRs in the same namespace conflict because they share the same hardcoded resource names
Only one DaemonSet named vizier-pem can exist per namespace, with the last created Vizier overwriting previous ones

Use Case and Problem

Our Kubernetes cluster contains multiple types of nodes with different hardware configurations:

GPU nodes: Require higher memory limits for PEM (8Gi)
CPU nodes: Use standard memory limits for PEM (2Gi)
Edge nodes: Need minimal resource allocation

Currently, we cannot efficiently deploy PEM with different resource configurations for different node types within a single namespace because:

Resource Name Conflicts: All Vizier instances use hardcoded names like vizier-pem in the YAML templates
Namespace Isolation Issues: When trying to deploy Vizier in separate namespaces, we encounter ServiceAccount permission issues for cluster-wide resources
Limited Patches Capability: The current patches mechanism doesn't fully address the need for drastically different configurations

Specific Questions for the Community

1. Design Rationale

Why is Vizier designed as a full namespace-level deployment rather than component-specific CRs?

If one Vizier CR represents an entire cluster monitoring stack, what is the purpose of using a Custom Resource instead of a traditional Helm chart or static manifests?

2. Multi-Configuration Support

What is the recommended approach for deploying PEM with different configurations for different node types?

Current options seem to be:

Use a single Vizier with generic patches (limited flexibility)
Deploy in separate namespaces (ServiceAccount permission issues)
Modify the source code (not maintainable)

3. Future Roadmap

Are there plans to support:

Multiple PEM DaemonSets per namespace with different configurations?
Dynamic resource naming to avoid conflicts?
Better multi-tenancy support within a single cluster?

Technical Details

Current Architecture

One Vizier CR → Creates:
├── vizier-cloud-connector (Deployment)
├── vizier-metadata (StatefulSet/Deployment)
├── vizier-query-broker (Deployment)
└── vizier-pem (DaemonSet) ← The component we want multiple instances of

Desired Architecture

Cluster Level:
├── vizier-cloud-connector (Deployment) ← Shared
├── vizier-metadata (StatefulSet) ← Shared
├── vizier-query-broker (Deployment) ← Shared
├── vizier-pem-gpu (DaemonSet) ← GPU nodes, 8Gi memory
├── vizier-pem-cpu (DaemonSet) ← CPU nodes, 2Gi memory
└── vizier-pem-edge (DaemonSet) ← Edge nodes, 1Gi memory

Code References

The hardcoded naming is evident in:

k8s/vizier/pem/base/pem_daemonset.yaml (name: vizier-pem)
src/operator/controllers/vizier_controller.go (deployVizierCore function)

Example Configuration Attempted

# This creates conflicts because both Viziers try to create resources with the same names
---
apiVersion: px.dev/v1alpha1
kind: Vizier
metadata:
  name: pixie-vizier-gpu
  namespace: pl
spec:
  # GPU-specific configuration
  patches:
    vizier-pem: |
      {"spec": {"template": {"spec": {"affinity": {"nodeAffinity": {"requiredDuringSchedulingIgnoredDuringExecution": {"nodeSelectorTerms": [{"matchExpressions": [{"key": "nvidia.com/gpu.present", "operator": "In", "values": ["true"]}]}]}}}}}}}
---
apiVersion: px.dev/v1alpha1
kind: Vizier
metadata:
  name: pixie-vizier-cpu
  namespace: pl
spec:
  # CPU-specific configuration
  patches:
    vizier-pem: |
      {"spec": {"template": {"spec": {"affinity": {"nodeAffinity": {"requiredDuringSchedulingIgnoredDuringExecution": {"nodeSelectorTerms": [{"matchExpressions": [{"key": "nvidia.com/gpu.present", "operator": "DoesNotExist"}]}]}}}}}}}

Environment

Pixie Operator Version: 0.1.7
Vizier Version: 0.14.14
Kubernetes Version: 1.28+
Deployment Method: Operator

Community Input Needed

I'd appreciate insights from the maintainers and community about:

Design Intent: Was this architecture intentional, and what benefits does it provide?
Workarounds: Are there established patterns for handling multi-configuration scenarios?
Future Plans: Is multi-DaemonSet support on the roadmap?
Contributions: Would the community be interested in contributions that enable this functionality?

Thank you for building such an amazing observability platform! Looking forward to understanding the design decisions and potential solutions.

Labels: question, enhancement, operator, vizier Related Issues: [If any existing issues are related]

Jul 11 '25 13:07 Pger-Y