Add Dict Parameter Access Feature for Pipeline Parameters

Summary

This PR adds support for extracting individual values from dictionary pipeline parameters using Pythonic syntax, eliminating the need to pass entire dictionaries to components that only need specific values.

Motivation

Previously, when a pipeline had a dictionary parameter, the entire dictionary had to be passed to every component, even if the component only needed a single value or subset. This led to:

Unnecessary data exposure to components
Reduced code clarity
Potential security concerns with passing sensitive data unnecessarily

Changes

1. New `DictSubvariable` Class (`sdk/python/kfp/dsl/pipeline_channel.py`)

Introduced DictSubvariable class to represent values extracted from dict parameters
Added __getitem__ method to PipelineParameterChannel to enable dict-style access
Supports both single-level and nested dict access
Supports passing sub-dictionaries to components

2. Compiler Support (`sdk/python/kfp/compiler/pipeline_spec_builder.py`)

Extended compiler to recognize DictSubvariable instances
Generates appropriate CEL (Common Expression Language) parameter_expression_selector expressions
Supports nested access by building chained CEL expressions
Works in both task and group contexts

3. Type Checking (`sdk/python/kfp/dsl/pipeline_task.py`)

Modified type compatibility checking to skip strict validation for DictSubvariable
Actual types are resolved at runtime by the CEL evaluator

4. Tests

Added comprehensive unit tests in pipeline_channel_test.py
Added integration test pipeline in test_data/pipeline_files/valid/pipeline_with_dict_parameter_access.py

Usage Examples

Single-Level Access

@dsl.pipeline
def my_pipeline(config: dict):
    # Extract a single value
    component1(db_host=config['db_host'])

Nested Access

@dsl.pipeline
def my_pipeline(config: dict):
    # Access nested values
    component2(host=config['database']['host'])
    component3(username=config['database']['credentials']['username'])

Sub-Dictionary Passing

@dsl.pipeline
def my_pipeline(config: dict):
    # Pass an entire sub-dictionary
    component4(db_config=config['database'])

Technical Details

Compile-Time vs Runtime

Compile-time: The SDK creates DictSubvariable objects and the compiler generates CEL expressions
Runtime: The backend driver evaluates CEL expressions to extract values from the actual JSON data

CEL Expression Generation

Single: parseJson(string_value)["key"]
Nested: parseJson(string_value)["database"]["host"]
Backend already supports these expressions (no backend changes required)

Benefits

Improved Security: Components only receive the data they need
Better Code Clarity: Clear intent about what data each component uses
Reduced Coupling: Components don't depend on entire config structures
Type Safety: Type checking happens at runtime based on actual values
Backward Compatible: Existing code continues to work unchanged

Testing

Run unit tests:

pytest -v sdk/python/kfp/dsl/pipeline_channel_test.py::DictSubvariableTest

Run integration test:

python test_data/pipeline_files/valid/pipeline_with_dict_parameter_access.py

Checklist

[x] Added unit tests
[x] Added integration test
[x] Code follows project formatting standards (isort)
[x] All tests pass
[x] No breaking changes
[x] Documentation (docstrings) added

Related Issues

Closes #12418

Additional Notes

This feature leverages existing backend CEL expression evaluation capabilities, so no server-side changes are required. All changes are SDK-side and applied at compile time.

Nov 07 '25 10:11 wassimbensalem

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign james-jwu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Nov 07 '25 10:11 google-oss-prow[bot]

Hi @wassimbensalem. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Nov 07 '25 10:11 google-oss-prow[bot]

/assign

Nov 07 '25 10:11 wassimbensalem

/ok-to-test

Nov 07 '25 17:11 hbelmiro

/ok-to-test

Nov 10 '25 12:11 hbelmiro

/ok-to-test

Nov 10 '25 14:11 hbelmiro

Is anything still missing here?

Nov 11 '25 14:11 wassimbensalem

@hbelmiro can we test here?

Nov 24 '25 16:11 wassimbensalem

@wassimbensalem I just asked my AI Code assistant to do some analysis on the PR, and as expected it provided some security risks:

Risks

CEL Expression Injection Risks ⚠️ HIGH PRIORITY

● The PR introduces CEL expression generation like parseJson(string_value)["key"]. This presents several injection vectors:

Dynamic Key Injection: If user-controlled data becomes part of the CEL expression keys, malicious actors could potentially:
- Access unintended data: config['../../sensitive_data']
- Cause DoS through complex expressions: config[repeat("x", 1000000)]
JSON Parsing Vulnerabilities: The parseJson() function could be exploited with:
- Malformed JSON leading to parser exploits
- Deeply nested objects causing stack overflow
- Large payloads causing memory exhaustion

Input Validation and Sanitization Issues ⚠️ MEDIUM PRIORITY

From pipeline_channel.py:92-95, I see basic regex validation for names: valid_name_regex = r'^[A-Za-z][A-Za-z0-9\s_-]*$'

Concerns:

Key Name Validation: The PR doesn't specify validation for dict keys used in access patterns
Type Confusion: Runtime resolution could allow type confusion attacks
Path Traversal: Dict access patterns like config['../../../secret'] could bypass intended boundaries

Data Exposure and Access Control Issues ⚠️ HIGH PRIORITY

● The PR claims "Improved Security: Components only receive the data they need," but introduces new risks:

Privilege Escalation: Components could potentially access more data than intended through:
- Nested access patterns: config['system']['admin_password']
- Array/object traversal: config['users'][0]['credentials']
Information Disclosure: Error messages from CEL evaluation could leak:
- Dict structure information
- Sensitive key names
- Value type information

Serialization and Deserialization Vulnerabilities ⚠️ MEDIUM PRIORITY

From pipeline_spec_builder.py:73-85, dict values are converted to protobuf. The new feature adds:

Deserialization Attacks: JSON parsing in CEL could be exploited with malicious payloads
Memory Exhaustion: Large nested dicts could cause resource exhaustion

Security Recommendations

Immediate Actions Required:

Implement Strict Key Validation: - Whitelist allowed characters in dict keys - Prevent path traversal patterns (../, absolute paths) - Limit key length and nesting depth
Add CEL Expression Sanitization: - Validate generated CEL expressions before evaluation - Implement expression complexity limits - Add timeout protection for CEL evaluation
Enhance Access Control: - Document which dict keys should be accessible to which components - Consider implementing a permission model for dict access - Add audit logging for dict access patterns

Long-term Considerations:

Security Testing: Add specific test cases for injection attempts and edge cases
Documentation: Clearly document security boundaries and recommended usage patterns
Monitoring: Implement runtime monitoring for suspicious access patterns

Verdict

While the feature improves functionality, the current implementation introduces significant security risks that need addressing before production deployment. The CEL injection vectors and lack of proper input validation are particularly concerning in multi-tenant environments.

Recommendation: Request security review and implementation of the above mitigations before merging PR #12419.

fyi @HumairAK @droctothorpe @mprahl @zazulam

Dec 04 '25 21:12 nsingla

feat(sdk): Add dict parameter access for granular value extraction

Add Dict Parameter Access Feature for Pipeline Parameters

Summary

Motivation

Changes

1. New DictSubvariable Class (sdk/python/kfp/dsl/pipeline_channel.py)

2. Compiler Support (sdk/python/kfp/compiler/pipeline_spec_builder.py)

3. Type Checking (sdk/python/kfp/dsl/pipeline_task.py)

4. Tests

Usage Examples

Single-Level Access

Nested Access

Sub-Dictionary Passing

Technical Details

Compile-Time vs Runtime

CEL Expression Generation

Benefits

Testing

Checklist

Related Issues

Additional Notes

Risks

Security Recommendations

Immediate Actions Required:

Long-term Considerations:

Verdict

1. New `DictSubvariable` Class (`sdk/python/kfp/dsl/pipeline_channel.py`)

2. Compiler Support (`sdk/python/kfp/compiler/pipeline_spec_builder.py`)

3. Type Checking (`sdk/python/kfp/dsl/pipeline_task.py`)