azureml-examples
azureml-examples copied to clipboard
Using ParallelRunStep output as an input to another step
Using ParallelRunStep output as an input to another step
This is specific to Python SDK.
I'm attempting to use ParallelRunStep output as an input to another step which I haven't been able to see an example of anywhere. My use case is simple, I wish to save the output of the pipeline with some additional transforms as a csv.
The closest example I've been able to find is here: https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-batch-scoring-classification#download-and-review-output, but it has a bug, the delimiter for the "parallel_run_step.txt" is a whitespace not a colon.
Here is code that I managed to get working eventually in my transform.py that's run after ParallelRunStep
transform.py
import pandas as pd
import os
import argparse
from azureml.core import Run
parser = argparse.ArgumentParser(description="Transform")
parser.add_argument('--output_path', dest="output_path", required=True)
args, _ = parser.parse_known_args()
run = Run.get_context()
input_dir = run.input_datasets["input_data"]
input_data_path = os.path.join(input_dir,"parallel_run_step.txt")
input_df = pd.read_csv(input_data_path, delimiter=" ", header=None)
# Transform
transformed_df.to_csv(os.path.join(args.output_path,"processed_data.csv"))
PythonScriptStep
transform_step = PythonScriptStep(
source_directory=src_dir,
name="transform",
script_name="transform.py",
compute_target=compute_target,
runconfig=aml_run_config,
inputs=[parallel_step_output.as_input('input_data') ],
arguments=["--output_path", saved_output],
Would be great to provide a similar example in the documentation and fix the issue with the wrong delimiter in the already provided examples