Empty yaml documents should be ignored when importing lists
What version of CUE are you using (cue version)?
$ cue version
cue version 0.9.2
go version go1.22.5
-buildmode pie
-compiler gc
-trimpath true
DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
CGO_ENABLED 1
GOARCH amd64
GOOS linux
GOAMD64 v1
cue.lang.version v0.9.2
Does this issue reproduce with the latest stable release?
Yes.
What did you do?
Import a list of yaml documents and transform it with -l.
❯ cue import -l "strings.ToLower(kind)" --list abc.yaml
error evaluating label strings.ToLower(kind): reference "kind" not found
abc.yaml
---
kind: Deployment
---
---
kind: Deployment
What did you expect to see?
It would be nice if this just didn't add anything to the list instead of failing. I use cue import a lot to import Helm charts and often, because Helm is just a bad text templating engine, it will include empty yaml documents. I therefore have to find the empty documents myself and run the import again.
What did you see instead?
error evaluating label strings.ToLower(kind): reference "kind" not found
This seems reasonable. Do you want to send a patch with a test?
I would like to take a look at this issue
The YAML decoder decodes an empty YAML as null, making it indistinguishable from an actual null. Would it be reasonable to ignore null when importing lists?
@haoqixu yes, I think that's fine. The YAML spec strongly hints that an empty document is equivalent to a null value.
What is the expected behavior when a document is not empty? Is it possible to handle missing keys gracefully for the general case? Say:
❯ cue import -l "strings.ToLower(metadata.namespace)" --list abc.yaml
---
metadata:
namespace: some-namespace
---
metadata:
name: some-name # missing namespace
---
metadata:
namespace: some-namespace
If multiple behaviors are wanted depending on the use case, we could always rethink the flag slightly so that --list is equivalent to --list=all, and add another mode like --list=nonnull or --list=nonempty. Hopefully the added UX complexity is not needed though.
I wrote a testscript to think through this a bit more before we merge the change:
exec cue import --outfile=- --list input-noempty.yaml
exec cue import --outfile=- --path=strings.ToLower(kind) input-noempty.yaml
exec cue import --outfile=- --list --path=strings.ToLower(kind) input-noempty.yaml
exec cue import --outfile=- --list --path=fixedpath: input-noempty.yaml
exec cue import --outfile=- --list input-withempty.yaml
exec cue import --outfile=- --path=strings.ToLower(kind) input-withempty.yaml
exec cue import --outfile=- --list --path=strings.ToLower(kind) input-withempty.yaml
exec cue import --outfile=- --list --path=fixedpath: input-withempty.yaml
-- input-noempty.yaml --
kind: Deployment
---
kind: Deployment
---
kind: Other
-- input-withempty.yaml --
---
kind: Deployment
---
---
kind: Deployment
---
kind: Other
As of master, the cases with empty documents fais when using --path, as reported:
> exec cue import --outfile=- --list input-withempty.yaml
[stdout]
[{
kind: "Deployment"
}, null, {
kind: "Deployment"
}, {
kind: "Other"
}]
> exec cue import --outfile=- --path=strings.ToLower(kind) input-withempty.yaml
[stderr]
error evaluating label strings.ToLower(kind): reference "kind" not found
[exit status 1]
FAIL: repro-cmd.txtar:7: unexpected command failure
> exec cue import --outfile=- --list --path=strings.ToLower(kind) input-withempty.yaml
[stderr]
error evaluating label strings.ToLower(kind): reference "kind" not found
[exit status 1]
FAIL: repro-cmd.txtar:8: unexpected command failure
> exec cue import --outfile=- --list --path=fixedpath: input-withempty.yaml
[stdout]
fixedpath: [{
kind: "Deployment"
}, null, {
kind: "Deployment"
}, {
kind: "Other"
}]
With https://review.gerrithub.io/c/cue-lang/cue/+/1202049 at patchset 3 on top of master, the results with withempty show:
> exec cue import --outfile=- --list input-withempty.yaml
[stdout]
[{
kind: "Deployment"
}, null, {
kind: "Deployment"
}, {
kind: "Other"
}]
> exec cue import --outfile=- --path=strings.ToLower(kind) input-withempty.yaml
[stderr]
error evaluating label strings.ToLower(kind): reference "kind" not found
[exit status 1]
FAIL: repro-cmd.txtar:7: unexpected command failure
> exec cue import --outfile=- --list --path=strings.ToLower(kind) input-withempty.yaml
[stdout]
deployment: [{
kind: "Deployment"
}, {
kind: "Deployment"
}]
other: [{
kind: "Other"
}]
> exec cue import --outfile=- --list --path=fixedpath: input-withempty.yaml
[stdout]
fixedpath: [{
kind: "Deployment"
}, {
kind: "Deployment"
}, {
kind: "Other"
}]
The first case looks oddly inconsistent now, because we're not ignoring the empty document in that case. On one hand, keeping the nulls in means that we're more directly representing the list of documents from the original YAML. On the other hand, it's a bit odd that adding --path makes some of the documents disappear, even when the path is fixed, like the fourth example.
The second case looks like it's still a bug; we should not fail, for the sake of consistency.
The third case looks OK; we are now ignoring the empty document as discussed.
My thinking is as follows: --list and --files should always ignore empty documents in multi-document inputs. --path, when used on its own, needs to be a bit more intelligent: if all of the input documents fail with the --path expression, then it should report an error, because the user probably made a typo or mistake with the expression. However, if just some but not all of the documents fail to look up the path, then we should ignore those documents as they could be null or otherwise missing the field.
My thinking is as follows:
--listand--filesshould always ignore empty documents in multi-document inputs.--path, when used on its own, needs to be a bit more intelligent: if all of the input documents fail with the--pathexpression, then it should report an error, because the user probably made a typo or mistake with the expression. However, if just some but not all of the documents fail to look up the path, then we should ignore those documents as they could be null or otherwise missing the field.
This approach seems reasonable to me.
After discussing with @rogpeppe and @mpvl we agreed on a simpler approach and smaller fix: make --path skip over null values (such as empty YAML documents) when the --path argument involves a reference. That is, with --path=foo or --path=strings.ToLower(kind), but not with --path=fixedpath:.
We came to the conclusion that always ignoring null values in --list is not ideal, because that loses some information, and one can always filter out the nulls later if they wish to as part of interpreting the data.
@haoqixu are you happy to update https://review.gerrithub.io/c/cue-lang/cue/+/1202049 accordingly? Please also add more tests, in line with the testscript I shared above, to make sure we do what we would expect in each scenario.
I also filed https://github.com/cue-lang/cue/issues/3608 for an idea we had to extend the YAML encoding a bit too, so that it is able to omit empty or null documents entirely from an input. However, that's not needed to resolve the issue here with cue import --path.
After discussing with @rogpeppe and @mpvl we agreed on a simpler approach and smaller fix: make
--pathskip over null values (such as empty YAML documents) when the--pathargument involves a reference. That is, with--path=fooor--path=strings.ToLower(kind), but not with--path=fixedpath:.We came to the conclusion that always ignoring null values in
--listis not ideal, because that loses some information, and one can always filter out the nulls later if they wish to as part of interpreting the data.@haoqixu are you happy to update https://review.gerrithub.io/c/cue-lang/cue/+/1202049 accordingly? Please also add more tests, in line with the testscript I shared above, to make sure we do what we would expect in each scenario.
No problem. I have submitted another CL to add tests and will update CL 1202049 soon.