dag-factory
dag-factory copied to clipboard
Add json schema validation for yaml files
Hi there 👋
When I write code I always use a linter, my IDE tells me I'm about to make a mess by not importing certain library or not using the right type.
This does not apply to config files, They're super nice because config "always compiles" 😉 . But the code running those config files not always like it. So to have a little bit more control I try to implement "schemas" on those config files. They're looser than "clases", also language independent, easier to maintain and to run.
It would be nice to have something like
import yaml
from jsonschema import validate
if __name__ == '__main__':
schema = """
type: object
properties:
default_args:
type: object
properties:
start_date:
type: string
schedule_interval:
type: string
catchup:
type: boolean
tasks:
type: array
items:
type: object
properties:
operator:
type: string
dependencies:
type: array
items:
type: string
"""
validate(
instance=yaml.safe_load(open('path/to/my_dag.yml')),
schema=yaml.safe_load(schema)
)
But this schema
is the one I generated, it doesn't distinguish between airflow versions, it's not up to date with the library. It would be nice to have a check like this. Pycharm picks up those schemas to validate things like .gitlab-ci.yaml
. So we could as well be writing DAGs the "safest" way with free linting.
Also a though. But this way the codebase could ease up a little bit on the yaml validation. Reducing the core
codebase makes it easier to read and maintain.
Regardless of what you think of this porposal, thanks for your work, and for open sourcing this library. It rocks 🚀 !
This looks like a great idea! I'm not sure I'll get to it for a bit though, but feel free to submit a PR!