check-jsonschema icon indicating copy to clipboard operation
check-jsonschema copied to clipboard

Support for custom YAML tags

Open DevOpsJeremy opened this issue 1 year ago • 2 comments

This issue requested support specifically for the !reference tag in a Gitlab CI file, however, there doesn't appear to be support for other tags. Specifically in my case, Ansible-specific tags like !unsafe or !vault.

Example content:

controller_templates:
  - name: My job template
    extra_vars:
      pass: !unsafe '{{ my_value }}'

Output:

Traceback (most recent call last):
  File "/home/user/.local/bin/check-jsonschema", line 8, in <module>
    sys.exit(main())
  File "/home/user/.local/lib/python3.6/site-packages/check_jsonschema/__init__.py", line 26, in main
    ret = checker.run()
  File "/home/user/.local/lib/python3.6/site-packages/check_jsonschema/checker.py", line 83, in run
    self._run()
  File "/home/user/.local/lib/python3.6/site-packages/check_jsonschema/checker.py", line 66, in _run
    errors = self._build_error_map()
  File "/home/user/.local/lib/python3.6/site-packages/check_jsonschema/checker.py", line 56, in _build_error_map
    for filename, doc in self._instance_loader.iter_files():
  File "/home/user/.local/lib/python3.6/site-packages/check_jsonschema/loaders/instance/__init__.py", line 63, in iter_files
    data = loadfunc(fp)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/main.py", line 343, in load
    return constructor.get_single_data()
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 113, in get_single_data
    return self.construct_document(node)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 123, in construct_document
    for _dummy in generator:
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 723, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 440, in construct_mapping
    return BaseConstructor.construct_mapping(self, node, deep=deep)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 255, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 146, in construct_object
    data = self.construct_non_recursive_object(node)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 181, in construct_non_recursive_object
    data = constructor(self, node)
  File "/home/user/.local/lib/python3.6/site-packages/ruamel/yaml/constructor.py", line 743, in construct_undefined
    node.start_mark,
ruamel.yaml.constructor.ConstructorError: could not determine a constructor for the tag '!unsafe'
  in "job_templates.yml", line 337, column 18

The workaround in my case is to convert the content to JSON with yq then validate--which isn't ideal.

Is it possible to add support for custom tags like these? Possibly by allowing the user to pass in a custom constructor for their respective tags that can be passed through to the ruamel.yaml parser.

DevOpsJeremy avatar Sep 28 '24 09:09 DevOpsJeremy

Although a custom constructor is a more general solution -- and I like the generality in principle -- if yq is working for you as a preprocessor, then that means that you don't need to evaluate these tags in order for the content to validate. Stripping them out or ignoring them is enough. If there's an option to ignore unknown tags in ruamel.yaml, then that's something which would probably fit well with the existing CLI structure for baked-in transforms for Azure and GitLab.

If you really need customizations for the YAML parsing, I think the interface needs some serious thought. One solution I like would be to allow users to pass their own parser. Then you can do whatever you want with no limitations, but check-jsonschema's interface stays really simple to understand.

I only want to pursue more complex solutions if there's sufficient demand. I think we can meet your needs with "ignore unknown tags" as an option, so I'll look into that first.

sirosen avatar Sep 30 '24 14:09 sirosen

I just discovered this tool via another tool that uses the pre-commit hook provided. I have so far only used the vscode extension redhat.vscode-yaml which provides validation in the editor. I've been looking for a CLI option to validate them in CI as well. This tool is exactly what I need! Thanks a lot!

We use mkdocs-material which provides its own schema on top of that of mkdocs. However, there are a bunch of custom tags which make check-jsonschema fail to load the file:

[...]
plugins:
  - git-revision-date-localized:
     # only enable when building in CI to speed up local builds
      # https://squidfunk.github.io/mkdocs-material/setup/adding-a-git-repository/#+git-revision-date-localized.enabled
      enabled: !ENV [CI, false]
[...]
markdown_extensions:
  - pymdownx.superfences:
      custom_fences:
        - name: mermaid
          class: mermaid
          format: !!python/name:pymdownx.superfences.fence_code_format

Both values of enabled and format cause failures such as:

Several files failed to parse.
  FailedFileLoadError: Failed to parse mkdocs.yml
    in "<path>/docs/.venv/lib/python3.13/site-packages/check_jsonschema/instance_loader.py", line 48
    >>> data: t.Any = self._parsers.parse_data_with_path(

    caused by

    ConstructorError: could not determine a constructor for the tag '!ENV'
      in "<byte string>", line 94, column 16:
              enabled: !ENV [CI, false]
                       ^ (line: 94)
      in "<path>/docs/.venv/lib/python3.13/site-packages/check_jsonschema/parsers/__init__.py", line 93
      >>> return loadfunc(data)

The vscode extension allows to define custom tags:

"yaml.customTags": [
    "!ENV scalar",
    "!ENV sequence",
    "!relative scalar",
    "tag:yaml.org,2002:python/name:material.extensions.emoji.to_svg",
    "tag:yaml.org,2002:python/name:material.extensions.emoji.twemoji",
    "tag:yaml.org,2002:python/name:pymdownx.superfences.fence_code_format"
  ]

This is documented here: https://squidfunk.github.io/mkdocs-material/creating-your-site/?h=yaml#minimal-configuration

In the built-in pre-commit hook (which also uses ruamel.yaml I had to pass the argument --unsafe to be able to load the file. Would that be an option?

mschoettle avatar Mar 03 '25 21:03 mschoettle

We are using this hook in pre-commit for Ansible repos, amongst other things, and sadly it fails on !vault tags used by ansible-vault. At the moment I don't see a workaround other than excluding the affected files, which is horrible because the github action is a separate and shared repo. How have others dealt with this? Is there a way we could somehow SED- stream edit to filter the tags from files as they are passed (by name) to the hook?

phil-dotchon avatar Apr 01 '25 22:04 phil-dotchon