azure-sdk-for-python
azure-sdk-for-python copied to clipboard
Add dependencies between components without the necessity to passing data through them
Azure SDK V2 components & for LLM Flows as components
Describe the bug Hi guys! i have been having the necessity of create pipelines with components where some of them not necessarily needs get a particular output from the last one, i have search in documentations if exists a way to create a dependencies between the components to having them running in series (not in parallel). There is a specific method/variable into the yaml files to do it? I have doing that connection using a "dummy" data from one component to another although the next one only uses that info for nothing, and it works but i would like know the correct way if exists.
If another info is necessary to be more clear please tell me. Thanks in advance!
Hi @anaarmas-sys, thank you for opening an issue! Is there a particular SDK library that you're using in this scenario? I see that the issue linked in this thread is for Azure ML, so is your question here also specific to ML? Knowing this will help us triage your issue effectively -- thank you!
Hi mccoyp! The issue is for Azure ML SDK V2 components (using YAML definition) and for components that contain the files created with Prompt Flow, let´s say, case a) I want to link two components created with SDK V2 (using YAML definition): comp1 ---> comp2 comp2 has to run after finishing comp1, without sent any particular data to comp2. I'm using Azure AI | Machine Learning Studio. Case b) I want to link two components : comp1 ---> comp2 where comp1 was created with SDK V2 (using YAML definition) and comp2 created with files to run a flow and do things with LLM models from prompt flow inside, etc. etc. Therefore, comp2 has a different YAML file (normally named flow.dag.yaml). comp2 has to run after finishing comp1, without sent any particular data to comp2. I'm using Azure AI | Machine Learning Studio.
I hope the explanation will be a little more clear.
Thanks a lot mccoyp!
Thank you for the details, @anaarmas-sys. Since this seems to be related to your previous issue, I'll tag the same folks to see if they can either help again or route this to the appropriate crowd. cc @yunjie-hub, @ahughes-msft, @nemanjarajic
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.
@eniac871 could you please help here.
Any update until this moment please?
Hi guys! Could you please give a suggestion for the issue above? Let me give you a specific example:
Let´s say that i have the next function to run a pipeline in azure ai ml:
@dsl.pipeline(compute = compute_name)
def flow_pipeline(
data_asset_name,
azure_embeddings_deployment,
azure_oai_endpoint,
azure_oai_key,
azure_search_endpoint,
azure_search_admin_key,
index_name
):
component1_job = component1(
data_asset_name = data_asset_name,
azure_embeddings_deployment = azure_embeddings_deployment,
azure_oai_endpoint = azure_oai_endpoint,
azure_oai_key = azure_oai_key,
azure_search_endpoint = azure_search_endpoint,
azure_search_admin_key = azure_search_admin_key,
index_name = index_name,
)
promptflow_component_job = promptflow_component(
data = Input(path=flow_data_path, type= AssetTypes.URI_FILE),
question = "${data.question}",
vectorstore_fields = component1_job.outputs.vectorstore_fields,
index_name = index_name
)
return {"pipeline_job_score_result": promptflow_component_job.outputs.flow_outputs}
component1 was created using yaml definition to run a python script. The promptflow_component is a promptflow registered as a component in azure ai ml:
I have tried to connect both components as normal with: vectorstore_fields = component1_job.outputs.vectorstore_fields that is an uri_folder.
And in the definitions of flow.dag.yaml i add the input vectorstore_fields as string, but it does not works: Cannot resolve the parameter value successfully. The parameter value does not meet the requirements. Linked parameter name is vectorstore_fields.
(I have tried other changes in that variable, etc. Nothing works to me at this moment)
The thing is that i want to run the component1 and when it is completed then run the promptflow_component. I dont have to send nothing relevant from the first component to the second one, but normally, the pipelines run components in series sending outputs from one component to another in sdk v2.
Other way: In sdk v1 it has the PipelineStep to do pipeline = Pipeline( steps=[first_component, second_component])
For sdk v2 how do it?
Thank you in advanced!
Hi @anaarmas-sys, thanks for reaching out.
- AzureML SDK v2 doesn't support execution order control without data dependencies for now.
- flow will be loaded into a component with specific interfaces; you can't add/remove ports in
flow.dag.yaml
, or it will be invalid when using without azure.
As a workaround, you can add an extra command node, which accepts origin input data of the flow and a signal from the prior node:
@dsl.pipeline
def pipeline_with_flow(data, index_name):
component1_job = component1()
extra_job = pass_through_component(data=data, signal=component1_job.outputs.vectorstore_fields)
promptflow_component_job = promptflow_component(
data = extra_job.outputs.output,
question = "${data.question}",
index_name = index_name
)
return {"pipeline_job_score_result": promptflow_component_job.outputs.flow_outputs}
Hi @elliotzh, I have tried your suggestion but still have the same problem due to the component1 sends urifolder/urifile/mltable type as outputs and the promptflow component does not recognize that type when comes from a component (apparently).
Hi @anaarmas-sys what do you mean by saying "does not recognize"? output of component1 won't be sent to the flow component. Instead, data
will be pass through to the flow component.
Could you please share you latest dsl pipeline code?