dasel icon indicating copy to clipboard operation
dasel copied to clipboard

Validate doesn't correctly validate yaml files per the spec.

Open endophage opened this issue 3 years ago • 5 comments

Describe the bug The yaml spec requires that all keys in a mapping are unique (spec for reference: https://yaml.org/spec/1.2.2/). dasel validate does not catch this issue in a yaml file

To Reproduce Steps to reproduce the behavior:

$> cat <<EOF | tee tmp.yaml
key:
        foo: "bar"
key:
        foo: "baz"
EOF

$> dasel validate tmp.yaml
pass tmp.yaml

Expected behavior Validation should fail because of the duplicate key key at the top level.

Desktop (please complete the following information):

  • OS: MacOS
  • Version: development-v1.27.4-0.20221207223452-2761f5761361

Additional context This would be useful for validating kubernetes yaml files as every now and again somebody unintentionally redeclares a key in a map and kubernetes doesn't error, it just takes the last assignment.

endophage avatar Dec 09 '22 21:12 endophage

The method of validation dasel uses here is just to attempt to read the YAML and fail if it cannot be parsed, so largely the same approach kubernetes is using.

We use gopkg.in/yaml.v2 to parse YAML, perhaps there's a strict mode or something to catch these cases.

TomWright avatar Dec 19 '22 11:12 TomWright

It does indeed have a strict mode it claims will catch duplicates: https://pkg.go.dev/gopkg.in/yaml.v2#UnmarshalStrict.

Happy to make a PR if that's a change you'll accept for dasel validate

endophage avatar Dec 20 '22 18:12 endophage

Yes of course. It may be good to hide it behind a --strict flag so it doesn't catch other users off-guard

TomWright avatar Dec 20 '22 18:12 TomWright

I spent some time looking at this over Christmas. Before I spend more time on it there's a bit of a refactor required and I wanted to run that by you before making the changes.

In the init methods for each of your parsers, e.g. yaml here, you instantiate actual instances of the parser. Then in your New*Parser* methods here you simply return the instance. From there, because we're reading a file, we call FromBytes which only takes the raw input []byte.

So, there are 2 options to enable configurability on the parsers:

  1. Don't return the instance of the parser, return a factory that can be used to generate an instance given optional configuration.
  2. Add a Configure or similarly named method on the parsers that can accept options to configure the instance.
  3. Have FromBytes take additional options like ToBytes already does.

In this specific case, the addition of a "strict" mode feels like parser level configuration, not method level, i.e. a parser is configured once then can be used many times with that configuration. That means we're looking at options 1 or 2.

Option 2 is probably slightly less work but it's essentially a half assed factory, i.e. Option 1. Additionally if somebody was using dasel as a library rather than a CLI tool, option 2 makes it easier to introduce bugs in sibling code paths (one changes the configuration that is then used by the other that doesn't expect the config). Enforcing this type of configuration at instantiation, option 1, makes it clear what the state of the parser is.

Let me know what your thoughts are and I can take a crack at the changes.

endophage avatar Dec 28 '22 18:12 endophage

I agree that option 1 is probably the right way to go here.

  1. It allows us to instantiate a parser based on some options given to the factory method.
  2. Dasel would not longer instantiate all parsers regardless of which parser is used.

Thanks for looking into this 🙂

TomWright avatar Dec 28 '22 20:12 TomWright