[sdk] Feature Request: Dict Parameter Access for Pipeline Parameters
Problem Statement
Currently, when a pipeline has a dictionary parameter, the entire dictionary must be passed to every component, even if the component only needs a single value or subset of values. This leads to:
- Unnecessary data exposure: Components receive more data than they need
- Reduced code clarity: It's unclear which parts of the config each component uses
- Security concerns: Sensitive data may be unnecessarily exposed to components
- Tight coupling: Components become dependent on the entire config structure
Proposed Solution
Add support for extracting individual values from dictionary pipeline parameters using Pythonic dict-style syntax:
@dsl.pipeline
def my_pipeline(config: dict):
# Single-level access
component1(db_host=config['db_host'])
# Nested access
component2(host=config['database']['host'])
# Sub-dict passing
component3(db_config=config['database'])
Use Cases
- Configuration Management: Pass a large config dict to the pipeline, but only extract specific values for each component
- Security: Ensure components only receive the data they need
- Code Organization: Improve clarity about what data each component uses
- Nested Configs: Handle complex nested configuration structures
Expected Behavior
- Support single-level dict access:
config['key'] - Support nested dict access:
config['level1']['level2'] - Support passing sub-dictionaries:
config['subdict'] - Generate appropriate CEL expressions at compile time
- Runtime evaluation by existing backend CEL evaluator (no backend changes needed)
Alternatives Considered
- Manual extraction in pipeline: Create intermediate variables - verbose and error-prone
- Component-level filtering: Components filter what they need - still exposes all data
- Separate parameters: Split config into many parameters - breaks encapsulation
Additional Context
This feature would leverage existing backend CEL (Common Expression Language) expression evaluation capabilities. The backend already supports parseJson(string_value)["key"] expressions, so this would be an SDK-side enhancement that generates the appropriate CEL expressions at compile time.
Implementation Notes
- Changes would be SDK-side only (Python)
- No backend changes required
- Fully backward compatible
- Compile-time transformation to CEL expressions
- Runtime type resolution via CEL evaluator
/assign
Given the existing CEL-based transformation support, your proposal seems to be reasonable (however, let's see what main maintainers say).
However, I'd say that it's not difficult to solve in a component-centric way without complex SDK/backend features:
@dsl.pipeline
def my_pipeline(config: dict):
# Single-level access
db_host = query_json(dict=config, path=".db_host").output
component1(db_host=db_host)
# Nested access
host= query_json(dict=config, path=".database.host").output
component2(host=host)
# Sub-dict passing
db_config= query_json(dict=config, path=".database").output
component3(db_config=db_config)
This style allows you to do any complex processing without the need to resort to SDK-based transformation.
Another benefit is that this trivially translates to working with data produced by other components (for example for HPO). Or to cases where more complex processing.
@dsl.pipeline
def my_pipeline():
config = get_config(date="2025-11-08").output
# Single-level access
db_host = query_json(dict=config, path=".db_host").output
component1(db_host=db_host)
# Nested access
host= query_json(dict=config, path=".database.host").output
component2(host=host)
# Sub-dict passing
db_config= query_json(dict=config, path=".database").output
component3(db_config=db_config)