Add schema validation to config files
Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/3029
Originally created by @kgpayne on 2021-10-28 09:58:07
Problem to solve
Today, config is loaded from YAML into a dict and passed directly to the Canonical class to create an object representation. This approach is naive - no validation is done on the input YAML file. This is even more important to add after #2456 as 'subfiles' support a subset of config that is definable in meltano.yml (i.e. they have different schemas).
Target audience
Anyone using the Meltano.
Further details
This is somewhat related to
- https://github.com/meltano/meltano/issues/2925 and
- https://github.com/meltano/meltano/issues/2133
on refactoring the way we handle and represent config internally.
Proposal
- define schemas for each config type
- check the shape of config dictionaries against their expected schema on load and dump
What does success look like, and how can we measure that?
Invalid config manifests as config load errors, rather than either 'unexpected parameter' errors or silent ignores of added config in unexpected places.
Regression test
(Ensure the feature doesn't cause any regressions)
- [ ] Write adequate test cases and submit test results
- [ ] Test results should be reviewed by a person from the team
Links / references
Please note that this was taken from GitLab, to be changed accordingly
Here are some recommendations after thinking about this a bit:
Here are a couple of recommendations:
- implement the validation as a warning first (don't fail). There may be some things that your schema validation would fail on, but are working today. Erroring on new validation errors should be considered a breaking change.
- Validate opportunistically. You can obviously dictate the overall schema of the file, but I believe it will be impossible to dictate the schema of the plugins. Therefore, the schema validation should support a plugin model where each of the taps/targets can provide a schema validation file and if present the schema validation process will use it. If such file is not present for a tap/target then the schema validation process will ignore that portion of the meltano.yml file.
- Validate fully. The validation process should validate the entire file and then report all of the issues found rather than erroring and quitting on the first issue.
-
meltanocli override. The cli needs a flag to either skip validation or just ignore any errors. As this is an open source project, it's entirely likely a plugin contributor could make a mistake in validation schema and it should be super easy for someone running the CLI to ignore errors. Perhaps the best bet to handle #1 and #4 is a new argument--validationwith possible values of "disabled" and "strict" where
- by default if the argument is not used, validation runs and issues are reported as warnings, but the process continues
- "disabled" - skips the validation entirely (doesn't run, no output)
- "strict" - issues are reported as errors and the process stops after completing the validation
This should also be supported in the meltano.yml file as
default_validation. I don't want to have to remember every time how to a particular project should be run. However, if I use the cli argument or an env variable, it will override the default behavior specified in the file - I believe this is just how default_environment works.
- Per plugin override. There also needs to be a way to specify the "--validation" behavior on a per plugin-basis. Just because one plugin has broken validation, I don't want to disable or ignore all other validation. I believe this should be configured in each plugin's section of the meltano.yml.
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.
This is still relevant.
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.
still relevant stalebot 😑