azureml-examples icon indicating copy to clipboard operation
azureml-examples copied to clipboard

Using ParallelRunStep output as an input to another step

Open alexszym opened this issue 3 years ago • 0 comments

Using ParallelRunStep output as an input to another step

This is specific to Python SDK.

I'm attempting to use ParallelRunStep output as an input to another step which I haven't been able to see an example of anywhere. My use case is simple, I wish to save the output of the pipeline with some additional transforms as a csv.

The closest example I've been able to find is here: https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-batch-scoring-classification#download-and-review-output, but it has a bug, the delimiter for the "parallel_run_step.txt" is a whitespace not a colon.

Here is code that I managed to get working eventually in my transform.py that's run after ParallelRunStep

transform.py

import pandas as pd
import os
import argparse
from azureml.core import Run

parser = argparse.ArgumentParser(description="Transform")
parser.add_argument('--output_path', dest="output_path", required=True)

args, _ = parser.parse_known_args()

run = Run.get_context()
input_dir = run.input_datasets["input_data"]

input_data_path = os.path.join(input_dir,"parallel_run_step.txt")

input_df = pd.read_csv(input_data_path, delimiter=" ", header=None)
# Transform
transformed_df.to_csv(os.path.join(args.output_path,"processed_data.csv"))

PythonScriptStep

    transform_step = PythonScriptStep(
        source_directory=src_dir,
        name="transform",
        script_name="transform.py",
        compute_target=compute_target,
        runconfig=aml_run_config,
        inputs=[parallel_step_output.as_input('input_data') ],
        arguments=["--output_path", saved_output],

Would be great to provide a similar example in the documentation and fix the issue with the wrong delimiter in the already provided examples

alexszym avatar Jan 18 '22 12:01 alexszym