haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Add (intermediate) results to pipeline_state, this makes it easier to accumulate results without keeping additional client-side state info

Open davidsbatista opened this issue 7 months ago • 1 comments

davidsbatista avatar May 30 '25 12:05 davidsbatista

@davidsbatista @julian-risch I drafted the solution if it is ok I will create a PR.

Proposed Solution

Add a minimal, opt-in feature that automatically accumulates intermediate results in the pipeline's internal state:

Core Changes

  1. Add pipeline_state property - A read-only view of accumulated intermediate results
  2. Add accumulate_intermediate_results parameter - Simple boolean flag for pipeline run methods
  3. Automatic accumulation - When enabled, collects outputs from all components during execution

🔧 How it works

# Before (manual approach)
pipeline.run(data, include_outputs_from={"comp1", "comp2", "comp3"})

# After (automatic approach)  
pipeline.run(data, accumulate_intermediate_results=True)

# Access accumulated state anytime
state = pipeline.pipeline_state
print(f"Component 'comp1' output: {state.get('comp1')}")

Implementation Details

The implementation is extremely lightweight:

  • Adds _pipeline_state: Dict[str, Any] = {} to base pipeline class
  • When accumulate_intermediate_results=True, automatically sets include_outputs_from to all components
  • During execution, populates _pipeline_state with component outputs
  • Exposes read-only access via pipeline_state property

YassinNouh21 avatar Jun 01 '25 09:06 YassinNouh21