kedro-azureml
kedro-azureml copied to clipboard
How to correctly create uri_file data assets ?
I am trying to save a data asset as a uri_file and the dataset is incorrectly saved as a uri_folder when I launch kedro azureml run
I have the following catalog:
projects_train_raw_local:
type: pandas.CSVDataSet
filepath: data/01_raw/dataset.csv
projects_train_raw:
type: kedro_azureml.datasets.AzureMLAssetDataSet
azureml_dataset: projects_train_raw
root_dir: data/00_azurelocals/
versioned: True
azureml_type: uri_file
dataset:
type: pandas.CSVDataSet
filepath: "dataset.csv"
and the following pipeline which just opens the local file and saves it
def create_pipeline(**kwargs) -> Pipeline:
return Pipeline(
nodes=[
node(
func=lambda x: x,
inputs="projects_train_raw_local",
outputs="projects_train_raw",
name="create_train_dataasset",
)
]
)
I expected a new data asset to be created on azure as an uri_file. However, i get the following info on azure
It seems my file is not saved correctly, which seems to correspond to this part in cli.py if I am not mistaken
# 2. Save dummy outputs
# In distributed computing, it will only happen on nodes with rank 0
if not pipeline_data_passing and is_distributed_master_node():
for data_path in azure_outputs.values():
(Path(data_path) / "output.txt").write_text("#getindata")
else:
logger.info("Skipping saving Azure outputs on non-master distributed nodes")
How can I correctly create a uri_file data asset ?