kedro-airflow-k8s
kedro-airflow-k8s copied to clipboard
ValueError: Failed to format pattern '${xxx}': no config value found, no default provided
Hello
With: kedro 0.17.4 kedro-airflow-k8s 0.7.3 python 3.8.12
I have a templated catalog:
training_data:
type: spark.SparkDataSet
filepath: data/${folders.intermediate}/training_data
file_format: parquet
save_args:
mode: 'overwrite'
layer: intermediate
with the parameter set in my globals.yml
folders:
intermediate: 02_intermediate
And when I run:
kedro airflow-k8s compile
I get the following error
Traceback (most recent call last):
File "/Users/user/miniconda3/envs/kedro/bin/kedro", line 8, in <module>
sys.exit(main())
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/cli/cli.py", line 265, in main
cli_collection()
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/cli/cli.py", line 210, in main
super().main(
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/cli.py", line 64, in compile
) = get_dag_filename_and_template_stream(
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/template.py", line 170, in get_dag_filename_and_template_stream
template_stream = _create_template_stream(
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/template.py", line 92, in _create_template_stream
pipeline_grouped=context_helper.pipeline_grouped,
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/context_helper.py", line 46, in pipeline_grouped
return TaskGroupFactory().create(self.pipeline, self.context.catalog)
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/context/context.py", line 329, in catalog
return self._get_catalog()
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/context/context.py", line 365, in _get_catalog
conf_catalog = self.config_loader.get("catalog*", "catalog*/**", "**/catalog*")
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 191, in get
return _format_object(config_raw, self._arg_dict)
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 264, in _format_object
new_dict[key] = _format_object(value, format_dict)
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 264, in _format_object
new_dict[key] = _format_object(value, format_dict)
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 279, in _format_object
return IDENTIFIER_PATTERN.sub(lambda m: str(_format_string(m)), val)
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 279, in <lambda>
return IDENTIFIER_PATTERN.sub(lambda m: str(_format_string(m)), val)
File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 242, in _format_string
raise ValueError(
ValueError: Failed to format pattern '${folders.intermediate}': no config value found, no default provided
With this conf/base/airflow-k8s.yaml
host: https://airflow.url
output: dags
run_config:
image: spark_image
image_pull_policy: Always
startup_timeout: 600
namespace: namespace
experiment_name: experiment
run_name: experiment
cron_expression: "@daily"
description: "experiment Pipeline"
service_account_name: namespace-vault
volume:
disabled: True
macro_params: [ds, prev_ds]
variables_params: []
I add the fact that kedro run
works.
Do you have any hint?
Sorry actually kedro run
doesn't work.
So it is not coming from kedro-airflow-k8s
Actually when I uninstall kedro-airflow-k8s
then kedro run works again
It seems that now with kedro-airflow-k8s-0.6.7 and with this conf/base/airflow-k8s.yaml
# Base url of the Apache Airflow, should include the schema (http/https)
host: https://airflow.url
# Directory from where Apache Airflow is reading DAGs definitions
output: dags
# Configuration used to run the pipeline
run_config:
# Name of the image to run as the pipeline steps
image: experiment
# Pull policy to be used for the steps. Use Always if you push the images
# on the same tag, or Never if you use only local images
image_pull_policy: IfNotPresent
# Pod startup timeout in seconds
startup_timeout: 600
# Namespace for Airflow pods to be created
namespace: airflow
# Name of the Airflow experiment to be created
experiment_name: experiment
# Name of the dag as it's presented in Airflow
run_name: experiment
# Apache Airflow cron expression for scheduled runs
cron_expression: "@daily"
# Optional start date in format YYYYMMDD
#start_date: "20210721"
# Optional pipeline description
#description: "Very Important Pipeline"
# Comma separated list of image pull secret names
#image_pull_secrets: my-registry-credentials
# Service account name to execute nodes with
#service_account_name: default
# Optional volume specification
volume:
# Storage class - use null (or no value) to use the default storage
# class deployed on the Kubernetes cluster
storageclass: # default
# The size of the volume that is created. Applicable for some storage
# classes
size: 1Gi
# Access mode of the volume used to exchange data. ReadWriteMany is
# preferred, but it is not supported on some environements (like GKE)
# Default value: ReadWriteOnce
#access_modes: [ReadWriteMany]
# Flag indicating if the data-volume-init step (copying raw data to the
# fresh volume) should be skipped
skip_init: False
# Allows to specify fsGroup executing pipelines within containers
# Default: root user group (to avoid issues with volumes in GKE)
owner: 0
# Tells if volume should not be used at all, false by default
disabled: False
# List of optional secrets specification
secrets:
# deploy_type: The type of secret deploy in Kubernetes, either `env` or
# `volume`
- deploy_type: "env"
# deploy_target: (Optional) The environment variable when `deploy_type` `env`
# or file path when `deploy_type` `volume` where expose secret. If `key` is
# not provided deploy target should be None.
deploy_target: "SQL_CONN"
# secret: Name of the secrets object in Kubernetes
secret: "airflow-secrets"
# key: (Optional) Key of the secret within the Kubernetes Secret if not
# provided in `deploy_type` `env` it will mount all secrets in object
key: "sql_alchemy_conn"
# Apache Airflow macros to be exposed for the parameters
# List of macros can be found here:
# https://airflow.apache.org/docs/apache-airflow/stable/macros-ref.html
macro_params: [ds, prev_ds]
# Apache Airflow variables to be exposed for the parameters
variables_params: [env]
# Optional resources specification
#resources:
# Default configuration used by all nodes that do not declare the
# resource configuration. It's optional. If node does not declare the resource
# configuration, __default__ is assigned by default, otherwise cluster defaults
# will be used.
#__default__:
# Optional labels to be put into pod node selector
#node_selectors:
#Labels are user provided key value pairs
#node_pool_label/k8s.io: example_value
# Optional labels to apply on pods
#labels:
#running: airflow
# Optional annotations to apply on pods
#annotations:
#iam.amazonaws.com/role: airflow
# Optional list of kubernetes tolerations
#tolerations:
#- key: "group"
#value: "data-processing"
#effect: "NoExecute"
#- key: "group"
#operator: "Equal",
#value: "data-processing",
#effect: "NoSchedule"
#requests:
#Optional amount of cpu resources requested from k8s
#cpu: "1"
#Optional amount of memory resource requested from k8s
#memory: "1Gi"
#limits:
#Optional amount of cpu resources limit on k8s
#cpu: "1"
#Optional amount of memory resource limit on k8s
#memory: "1Gi"
# Other arbitrary configurations to use
#custom_resource_config_name:
# Optional labels to be put into pod node selector
#labels:
#Labels are user provided key value pairs
#label_key: label_value
#requests:
#Optional amount of cpu resources requested from k8s
#cpu: "1"
#Optional amount of memory resource requested from k8s
#memory: "1Gi"
#limits:
#Optional amount of cpu resources limit on k8s
#cpu: "1"
#Optional amount of memory resource limit on k8s
#memory: "1Gi"
# Optional external dependencies configuration
#external_dependencies:
# Can just select dag as a whole
#- dag_id: upstream-dag
# or detailed
#- dag_id: another-upstream-dag
# with specific task to wait on
# task_id: with-precise-task
# Maximum time (minute) to wait for the external dag to finish before this
# pipeline fails, the default is 1440 == 1 day
# timeout: 2
# Checks if the external dag exists before waiting for it to finish. If it
# does not exists, fail this pipeline. By default is set to true.
# check_existence: False
# Time difference with the previous execution to look at (minutes),
# the default is 0 meaning no difference
# execution_delta: 10
# Optional authentication to MLflow API
#authentication:
# Strategy that generates the credentials, supported values are:
# - Null
# - GoogleOAuth2 (generating OAuth2 tokens for service account provided by
# GOOGLE_APPLICATION_CREDENTIALS)
# - Vars (credentials fetched from airflow Variable.get - specify variable keys,
# matching MLflow authentication env variable names, in `params`,
# e.g. ["MLFLOW_TRACKING_USERNAME", "MLFLOW_TRACKING_PASSWORD"])
#type: GoogleOAuth2
#params: []
I can run kedro airflow-k8s compile and it works But kedro run, still give the same error.
@stephanecollot Thanks for reporting an issue. If I'm not wrong it's related to getindata/kedro-kubeflow#72 - @szczeles can you confirm?
@em-pe Yep, it seems so. If we apply the same trick here, the issue should be gone.
@stephanecollot As a temporary workaround, you can try adding these lines into your project's settings.py
:
import sys
if 'airflow-k8s' not in sys.argv:
DISABLE_HOOKS_FOR_PLUGINS = ("kedro-airflow-k8s",)
If your code works with this hack, it's definitely same issue as https://github.com/getindata/kedro-kubeflow/issues/72
Thanks for your reply. Yes I tried DISABLE_HOOKS_FOR_PLUGINS and kedro run works again.