pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

[sdk] Feature Request: Dict Parameter Access for Pipeline Parameters

Open wassimbensalem opened this issue 1 month ago • 2 comments

Problem Statement

Currently, when a pipeline has a dictionary parameter, the entire dictionary must be passed to every component, even if the component only needs a single value or subset of values. This leads to:

  • Unnecessary data exposure: Components receive more data than they need
  • Reduced code clarity: It's unclear which parts of the config each component uses
  • Security concerns: Sensitive data may be unnecessarily exposed to components
  • Tight coupling: Components become dependent on the entire config structure

Proposed Solution

Add support for extracting individual values from dictionary pipeline parameters using Pythonic dict-style syntax:

@dsl.pipeline
def my_pipeline(config: dict):
    # Single-level access
    component1(db_host=config['db_host'])
    
    # Nested access
    component2(host=config['database']['host'])
    
    # Sub-dict passing
    component3(db_config=config['database'])

Use Cases

  1. Configuration Management: Pass a large config dict to the pipeline, but only extract specific values for each component
  2. Security: Ensure components only receive the data they need
  3. Code Organization: Improve clarity about what data each component uses
  4. Nested Configs: Handle complex nested configuration structures

Expected Behavior

  • Support single-level dict access: config['key']
  • Support nested dict access: config['level1']['level2']
  • Support passing sub-dictionaries: config['subdict']
  • Generate appropriate CEL expressions at compile time
  • Runtime evaluation by existing backend CEL evaluator (no backend changes needed)

Alternatives Considered

  1. Manual extraction in pipeline: Create intermediate variables - verbose and error-prone
  2. Component-level filtering: Components filter what they need - still exposes all data
  3. Separate parameters: Split config into many parameters - breaks encapsulation

Additional Context

This feature would leverage existing backend CEL (Common Expression Language) expression evaluation capabilities. The backend already supports parseJson(string_value)["key"] expressions, so this would be an SDK-side enhancement that generates the appropriate CEL expressions at compile time.

Implementation Notes

  • Changes would be SDK-side only (Python)
  • No backend changes required
  • Fully backward compatible
  • Compile-time transformation to CEL expressions
  • Runtime type resolution via CEL evaluator

wassimbensalem avatar Nov 07 '25 10:11 wassimbensalem

/assign

wassimbensalem avatar Nov 07 '25 10:11 wassimbensalem

Given the existing CEL-based transformation support, your proposal seems to be reasonable (however, let's see what main maintainers say).

However, I'd say that it's not difficult to solve in a component-centric way without complex SDK/backend features:

@dsl.pipeline
def my_pipeline(config: dict):
    # Single-level access
    db_host = query_json(dict=config, path=".db_host").output
    component1(db_host=db_host)
    
    # Nested access
    host= query_json(dict=config, path=".database.host").output
    component2(host=host)
    
    # Sub-dict passing
    db_config= query_json(dict=config, path=".database").output
    component3(db_config=db_config)

This style allows you to do any complex processing without the need to resort to SDK-based transformation.

Another benefit is that this trivially translates to working with data produced by other components (for example for HPO). Or to cases where more complex processing.

@dsl.pipeline
def my_pipeline():
    config = get_config(date="2025-11-08").output

    # Single-level access
    db_host = query_json(dict=config, path=".db_host").output
    component1(db_host=db_host)
    
    # Nested access
    host= query_json(dict=config, path=".database.host").output
    component2(host=host)
    
    # Sub-dict passing
    db_config= query_json(dict=config, path=".database").output
    component3(db_config=db_config)

Ark-kun avatar Nov 08 '25 19:11 Ark-kun