cue icon indicating copy to clipboard operation
cue copied to clipboard

Empty yaml documents should be ignored when importing lists

Open uhthomas opened this issue 1 year ago • 6 comments

What version of CUE are you using (cue version)?

$ cue version
cue version 0.9.2

go version go1.22.5
      -buildmode pie
       -compiler gc
       -trimpath true
  DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
cue.lang.version v0.9.2

Does this issue reproduce with the latest stable release?

Yes.

What did you do?

Import a list of yaml documents and transform it with -l.

❯ cue import -l "strings.ToLower(kind)" --list abc.yaml
error evaluating label strings.ToLower(kind): reference "kind" not found
abc.yaml
---
kind: Deployment
---
---
kind: Deployment

What did you expect to see?

It would be nice if this just didn't add anything to the list instead of failing. I use cue import a lot to import Helm charts and often, because Helm is just a bad text templating engine, it will include empty yaml documents. I therefore have to find the empty documents myself and run the import again.

What did you see instead?

error evaluating label strings.ToLower(kind): reference "kind" not found

uhthomas avatar Sep 25 '24 15:09 uhthomas

This seems reasonable. Do you want to send a patch with a test?

mvdan avatar Sep 26 '24 10:09 mvdan

I would like to take a look at this issue

haoqixu avatar Sep 30 '24 19:09 haoqixu

The YAML decoder decodes an empty YAML as null, making it indistinguishable from an actual null. Would it be reasonable to ignore null when importing lists?

haoqixu avatar Oct 01 '24 09:10 haoqixu

@haoqixu yes, I think that's fine. The YAML spec strongly hints that an empty document is equivalent to a null value.

mvdan avatar Oct 01 '24 09:10 mvdan

What is the expected behavior when a document is not empty? Is it possible to handle missing keys gracefully for the general case? Say:

❯ cue import -l "strings.ToLower(metadata.namespace)" --list abc.yaml
---
metadata:
    namespace: some-namespace
---
metadata:
    name: some-name # missing namespace
---
metadata:
    namespace: some-namespace

uhthomas avatar Oct 01 '24 12:10 uhthomas

If multiple behaviors are wanted depending on the use case, we could always rethink the flag slightly so that --list is equivalent to --list=all, and add another mode like --list=nonnull or --list=nonempty. Hopefully the added UX complexity is not needed though.

mvdan avatar Oct 01 '24 13:10 mvdan

I wrote a testscript to think through this a bit more before we merge the change:

exec cue import --outfile=- --list                              input-noempty.yaml
exec cue import --outfile=-        --path=strings.ToLower(kind) input-noempty.yaml
exec cue import --outfile=- --list --path=strings.ToLower(kind) input-noempty.yaml
exec cue import --outfile=- --list --path=fixedpath:            input-noempty.yaml

exec cue import --outfile=- --list                              input-withempty.yaml
exec cue import --outfile=-        --path=strings.ToLower(kind) input-withempty.yaml
exec cue import --outfile=- --list --path=strings.ToLower(kind) input-withempty.yaml
exec cue import --outfile=- --list --path=fixedpath:            input-withempty.yaml

-- input-noempty.yaml --
kind: Deployment
---
kind: Deployment
---
kind: Other

-- input-withempty.yaml --
---
kind: Deployment
---
---
kind: Deployment
---
kind: Other

As of master, the cases with empty documents fais when using --path, as reported:

> exec cue import --outfile=- --list                              input-withempty.yaml
[stdout]
[{
	kind: "Deployment"
}, null, {
	kind: "Deployment"
}, {
	kind: "Other"
}]
> exec cue import --outfile=-        --path=strings.ToLower(kind) input-withempty.yaml
[stderr]
error evaluating label strings.ToLower(kind): reference "kind" not found
[exit status 1]
FAIL: repro-cmd.txtar:7: unexpected command failure
> exec cue import --outfile=- --list --path=strings.ToLower(kind) input-withempty.yaml
[stderr]
error evaluating label strings.ToLower(kind): reference "kind" not found
[exit status 1]
FAIL: repro-cmd.txtar:8: unexpected command failure
> exec cue import --outfile=- --list --path=fixedpath:            input-withempty.yaml
[stdout]
fixedpath: [{
	kind: "Deployment"
}, null, {
	kind: "Deployment"
}, {
	kind: "Other"
}]

With https://review.gerrithub.io/c/cue-lang/cue/+/1202049 at patchset 3 on top of master, the results with withempty show:

> exec cue import --outfile=- --list                              input-withempty.yaml
[stdout]
[{
	kind: "Deployment"
}, null, {
	kind: "Deployment"
}, {
	kind: "Other"
}]
> exec cue import --outfile=-        --path=strings.ToLower(kind) input-withempty.yaml
[stderr]
error evaluating label strings.ToLower(kind): reference "kind" not found
[exit status 1]
FAIL: repro-cmd.txtar:7: unexpected command failure
> exec cue import --outfile=- --list --path=strings.ToLower(kind) input-withempty.yaml
[stdout]
deployment: [{
	kind: "Deployment"
}, {
	kind: "Deployment"
}]
other: [{
	kind: "Other"
}]
> exec cue import --outfile=- --list --path=fixedpath:            input-withempty.yaml
[stdout]
fixedpath: [{
	kind: "Deployment"
}, {
	kind: "Deployment"
}, {
	kind: "Other"
}]

The first case looks oddly inconsistent now, because we're not ignoring the empty document in that case. On one hand, keeping the nulls in means that we're more directly representing the list of documents from the original YAML. On the other hand, it's a bit odd that adding --path makes some of the documents disappear, even when the path is fixed, like the fourth example.

The second case looks like it's still a bug; we should not fail, for the sake of consistency.

The third case looks OK; we are now ignoring the empty document as discussed.

My thinking is as follows: --list and --files should always ignore empty documents in multi-document inputs. --path, when used on its own, needs to be a bit more intelligent: if all of the input documents fail with the --path expression, then it should report an error, because the user probably made a typo or mistake with the expression. However, if just some but not all of the documents fail to look up the path, then we should ignore those documents as they could be null or otherwise missing the field.

mvdan avatar Nov 28 '24 11:11 mvdan

My thinking is as follows: --list and --files should always ignore empty documents in multi-document inputs. --path, when used on its own, needs to be a bit more intelligent: if all of the input documents fail with the --path expression, then it should report an error, because the user probably made a typo or mistake with the expression. However, if just some but not all of the documents fail to look up the path, then we should ignore those documents as they could be null or otherwise missing the field.

This approach seems reasonable to me.

haoqixu avatar Dec 02 '24 09:12 haoqixu

After discussing with @rogpeppe and @mpvl we agreed on a simpler approach and smaller fix: make --path skip over null values (such as empty YAML documents) when the --path argument involves a reference. That is, with --path=foo or --path=strings.ToLower(kind), but not with --path=fixedpath:.

We came to the conclusion that always ignoring null values in --list is not ideal, because that loses some information, and one can always filter out the nulls later if they wish to as part of interpreting the data.

@haoqixu are you happy to update https://review.gerrithub.io/c/cue-lang/cue/+/1202049 accordingly? Please also add more tests, in line with the testscript I shared above, to make sure we do what we would expect in each scenario.

mvdan avatar Dec 02 '24 11:12 mvdan

I also filed https://github.com/cue-lang/cue/issues/3608 for an idea we had to extend the YAML encoding a bit too, so that it is able to omit empty or null documents entirely from an input. However, that's not needed to resolve the issue here with cue import --path.

mvdan avatar Dec 03 '24 09:12 mvdan

After discussing with @rogpeppe and @mpvl we agreed on a simpler approach and smaller fix: make --path skip over null values (such as empty YAML documents) when the --path argument involves a reference. That is, with --path=foo or --path=strings.ToLower(kind), but not with --path=fixedpath:.

We came to the conclusion that always ignoring null values in --list is not ideal, because that loses some information, and one can always filter out the nulls later if they wish to as part of interpreting the data.

@haoqixu are you happy to update https://review.gerrithub.io/c/cue-lang/cue/+/1202049 accordingly? Please also add more tests, in line with the testscript I shared above, to make sure we do what we would expect in each scenario.

No problem. I have submitted another CL to add tests and will update CL 1202049 soon.

haoqixu avatar Dec 03 '24 12:12 haoqixu