meltano Add schema validation to config files

Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/3029

Originally created by @kgpayne on 2021-10-28 09:58:07

Problem to solve

Today, config is loaded from YAML into a dict and passed directly to the Canonical class to create an object representation. This approach is naive - no validation is done on the input YAML file. This is even more important to add after #2456 as 'subfiles' support a subset of config that is definable in meltano.yml (i.e. they have different schemas).

Target audience

Anyone using the Meltano.

Further details

This is somewhat related to

https://github.com/meltano/meltano/issues/2925 and
https://github.com/meltano/meltano/issues/2133

on refactoring the way we handle and represent config internally.

Proposal

define schemas for each config type
check the shape of config dictionaries against their expected schema on load and dump

What does success look like, and how can we measure that?

Invalid config manifests as config load errors, rather than either 'unexpected parameter' errors or silent ignores of added config in unexpected places.

Regression test

(Ensure the feature doesn't cause any regressions)

[ ] Write adequate test cases and submit test results
[ ] Test results should be reviewed by a person from the team

Links / references

Please note that this was taken from GitLab, to be changed accordingly

Oct 28 '21 09:10 MeltyBot

View 6 previous comments from the original issue on GitLab

May 30 '22 09:05 MeltyBot

Here are some recommendations after thinking about this a bit:

Here are a couple of recommendations:

implement the validation as a warning first (don't fail). There may be some things that your schema validation would fail on, but are working today. Erroring on new validation errors should be considered a breaking change.
Validate opportunistically. You can obviously dictate the overall schema of the file, but I believe it will be impossible to dictate the schema of the plugins. Therefore, the schema validation should support a plugin model where each of the taps/targets can provide a schema validation file and if present the schema validation process will use it. If such file is not present for a tap/target then the schema validation process will ignore that portion of the meltano.yml file.
Validate fully. The validation process should validate the entire file and then report all of the issues found rather than erroring and quitting on the first issue.
meltano cli override. The cli needs a flag to either skip validation or just ignore any errors. As this is an open source project, it's entirely likely a plugin contributor could make a mistake in validation schema and it should be super easy for someone running the CLI to ignore errors. Perhaps the best bet to handle #1 and #4 is a new argument --validation with possible values of "disabled" and "strict" where

by default if the argument is not used, validation runs and issues are reported as warnings, but the process continues
"disabled" - skips the validation entirely (doesn't run, no output)
"strict" - issues are reported as errors and the process stops after completing the validation This should also be supported in the meltano.yml file as default_validation. I don't want to have to remember every time how to a particular project should be run. However, if I use the cli argument or an env variable, it will override the default behavior specified in the file - I believe this is just how default_environment works.

Per plugin override. There also needs to be a way to specify the "--validation" behavior on a per plugin-basis. Just because one plugin has broken validation, I don't want to disable or ignore all other validation. I believe this should be configured in each plugin's section of the meltano.yml.

Aug 14 '22 13:08 seajhawk

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

Apr 26 '23 18:04 stale[bot]

This is still relevant.

Apr 26 '23 19:04 WillDaSilva

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

Aug 24 '23 19:08 stale[bot]

still relevant stalebot 😑

Aug 24 '23 20:08 tayloramurphy