cue icon indicating copy to clipboard operation
cue copied to clipboard

cmd/cue: exporting multiple packages produces JSON with multiple root elements

Open jpluscplusm opened this issue 1 year ago • 5 comments
trafficstars

What version of CUE are you using (cue version)?

$ cue version
cue version v0.10.0

go version go1.23.0
      -buildmode exe
       -compiler gc
       -trimpath true
     CGO_ENABLED 0
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
cue.lang.version v0.10.0

Does this issue reproduce with the latest stable release?

v0.10.0 is latest.

What did you do?

! exec cue export .:a .:b --out json
cmp stderr error.txt
cmp stdout stdout.actual

exec cue export .:a .:b --out jsonl
cmp stdout stdout.golden

-- a.cue --
package a
A: true
B: true

-- b.cue --
package b
A: false
B: false

-- error.txt --
Multiple packages cannot be encoded as a single JSON text. To export multiple
packages as per-package JSON texts, specify the "jsonl" encoding:

    cue export .:package1 .:package2 --out jsonl

For information on the "jsonl" encoding, see "cue help filetypes".
-- stdout.golden --
{ "A": true, "B": true }
{ "A": false, "B": false }
-- stdout.actual --
{
    "A": true,
    "B": true
}
{
    "A": false,
    "B": false
}

What did you expect to see?

A passing test (after the line cmp stdout stdout.actual is removed) reflecting that multiple packages cannot be exported as valid, raw json, but only as jsonl as it can encode the multiple JSON texts that are implied by "exporting multiple CUE packages".

What did you see instead?

$ testscript -continue repro.txtar
> ! exec cue export .:a .:b --out json
[stdout]
{
    "A": true,
    "B": true
}
{
    "A": false,
    "B": false
}
FAIL: /tmp/testscript3108670768/repro.txtar/script.txtar:1: unexpected command success
> cmp stderr error.txt
diff stderr error.txt
--- stderr
+++ error.txt
@@ -0,0 +1,6 @@
+Multiple packages cannot be encoded as a single JSON text. To export multiple
+packages as per-package JSON texts, specify the "jsonl" encoding:
+
+    cue export .:package1 .:package2 --out jsonl
+
+For information on the "jsonl" encoding, see "cue help filetypes".

FAIL: /tmp/testscript3108670768/repro.txtar/script.txtar:2: stderr and error.txt differ
> cmp stdout stdout.actual
> exec cue export .:a .:b --out jsonl
[stdout]
{
    "A": true,
    "B": true
}
{
    "A": false,
    "B": false
}
> cmp stdout stdout.golden
diff stdout stdout.golden
--- stdout
+++ stdout.golden
@@ -1,8 +1,2 @@
-{
-    "A": true,
-    "B": true
-}
-{
-    "A": false,
-    "B": false
-}
+{ "A": true, "B": true }
+{ "A": false, "B": false }

FAIL: /tmp/testscript3108670768/repro.txtar/script.txtar:6: stdout and stdout.golden differ

Why this is a problem

The stdout.actual file in the above test script is rejected as JSON by:

  • most online "JSON validators" I've found
  • Python's json.tool: python -m json.tool stdout.actual
  • cue export, as of the recent (and IMHO correct!) change https://cuelang.org/cl/1198874
    • prior to this change, cue export used to normalise the contents of stdout.actual into a single JSON object when given it as input.

My reading of these documents suggests that these standards would reject stdout.actual as JSON:

  • https://json.org
  • https://ecma-international.org/publications-and-standards/standards/ecma-404/

    A JSON text is a sequence of tokens formed from Unicode code points that conforms to the JSON value grammar. A JSON value can be an object, array, number, string, true, false, or null

  • https://datatracker.ietf.org/doc/html/rfc8259

    A JSON text is a serialized value. A JSON value MUST be an object, array, number, or string, or one of the following three literal names: false, null, true

These tools accept stdout.actual:

  • jq . stdout.actual
  • Go's encoding/json stdlib, when decoding into an any var.

Clearly there's disagreement as to the JSON-ness of the output across different tooling in the ecosystem, and it's this uncertainty that I feel merits this issue being addressed. Given that one of its roles is "gluing other tools, processes and workflows together", the cue command should produce definitively and defensively correct output that cannot possibly confuse its downstream consumers.

A possible solution

I feel the cue command could be 100% technically correct if it allowed the above test script to pass (modulo the cmp stdout stdout.actual, which is only present to provide context for the rest of the issue). In other words:

  1. When asked to export multiple packages, emit JSONL if requested

AND

  1. When asked to export multiple packages as JSON (either as a default or with --out json) fail noisily, with a pointer towards the alternative JSONL encoding.

jpluscplusm avatar Jul 12 '24 18:07 jpluscplusm

What did you expect to see?

JSON without multiple root elements.

As discussed the other day, it's much clearer to have your expectation written in the reproducer, such that you can then write "A passing test" in response to this question.

Returning to the issue in question: why do you think cue export should be emitting a single object in this case?

myitcv avatar Jul 18 '24 12:07 myitcv

As discussed the other day, it's much clearer to have your expectation written in the reproducer, such that you can then write "A passing test" in response to this question.

I have updated the initial script so that it captures my expectations.

Returning to the issue in question: why do you think cue export should be emitting a single object in this case?

I don't think it should emit a single object. I think the command should fail, because asking for JSON (via --out json, or via the default of no --out flag) is asking for a single object. The consumer implicitly wants "valid JSON", which the cue export invocation doesn't produce, as demonstrated by both JSON-spec and ecosystem-compatibility reasons outlined in the issue.

jpluscplusm avatar Aug 19 '24 11:08 jpluscplusm

@mvdan is there any overlap here with your observation of a bug with JSON being treated as JSONL?

myitcv avatar Aug 19 '24 13:08 myitcv

Indeed, I think this is the reverse of the bug I fixed at https://review.gerrithub.io/c/cue-lang/cue/+/1198874. I fixed the JSON decoder so that it rejects zero or many values, but I didn't think to check the encoder as well. I'll send a fix.

mvdan avatar Aug 19 '24 13:08 mvdan

I'll send a fix.

I think it's worth me noting that all the potential resolutions that I could imagine have non-trivial side effects:

  1. The status quo. Effect: cue export .:p1 .:p2 | cue export json: - fails. Effect: cue export .:p1 .:p2 > file.json; cue export file.json fails. Effect: cue export .:p1 .:p2 | cue export - succeeds, with a loss of information as the 2 input CUE texts are unified. Effect: users who request JSON (--out json) or expect JSON (no --out flag) are disappointed, as the stream they're handed isn't JSON.

  2. My suggested option is outlined in the issue:

    • cue export .:p1 .:p2 and cue export .:p1 .:p2 --out json both fail with an error message
    • cue export .:p1 .:p2 --out jsonl emits a single line per package

    This continues disallowing cue export .:p1 .:p2 some.data.file --out jsonl, as multiple packages can't be mixed with other input types. Effect: users currently relying on cue export .:p1 .:p2 start seeing command failures.

  3. cue export .:p1 .:p2 and cue export .:p1 .:p2 --out json both emit JSONL. Effect: users who request JSON (--out json) or expect JSON (no --out flag) are disappointed, as the stream they're handed isn't JSON.

  4. cue export .:p1 .:p2 unifies the packages and emits a single stream of actual-JSON. Effect: a change of semantics. I think this would be a positive change as it would increase the consistency of the cue export command, and might then allow for non-package-based inputs to be combined with multiple packages. This would then offer the same "all arguments get unified" approach that every other cue export invocation delivers. I didn't propose it initially because it's a semantic change to a core cue subcommand.

jpluscplusm avatar Aug 20 '24 12:08 jpluscplusm

Many of the package-scoped notes above also apply to cue export with multiple -e params:

exec cue export --out json -e A -e B
cmp stdout out

-- file.cue --
package p

A: [ 1,2,3 ]
B: c: d: 42
-- out --
[
    1,
    2,
    3
]
{
    "c": {
        "d": 42
    }
}

JSON is requested, but not emitted (by the standards of the tooling mentioned in this issue's initial body).

jpluscplusm avatar Sep 29 '25 14:09 jpluscplusm

Thanks for thinking through some options in https://github.com/cue-lang/cue/issues/3288#issuecomment-2298727114. I don't think breaking cue export with multiple packages is an option; cue export ./... is a fairly common pattern, so that suddenly breaking would be really unfortunate.

Some users do specify an encoding, e.g. cue export --out yaml ./..., but others do not. And we haven't really gotten complaints from other users about the default behavior of cue export with multiple packages being broken due to the invalid JSON. Breaking cue export --out json ./... with many values is reasonable, though, because it could be argued that should have never worked in the first place.

So my suggestion to fix the "invalid JSON output" problem would be to tweak the documented defaults such that cue export keeps its current behavior, but it can be forced into valid JSON via cue export --out json, which would fail when trying to encode more than one value.

mvdan avatar Sep 29 '25 14:09 mvdan

Another option would be to change the default output format for cue export, but that would in itself be a breaking change, and we don't have good options:

  • YAML supports streaming multiple values, but it's a rather complex format
  • CUE doesn't support streaming multiple values, interestingly enough; we currently "split" outputs via // ---
  • There isn't another well-supported encoding format with streaming support

mvdan avatar Sep 29 '25 14:09 mvdan