cue icon indicating copy to clipboard operation
cue copied to clipboard

syntax: syntax converted from value lost map struct

Open FogDong opened this issue 3 years ago • 7 comments

What version of CUE are you using (cue version)?

$ cue version
0.4.3

Does this issue reproduce with the latest release?

yes

What did you do?

func main() {
	ctx := cuecontext.New()
	v := ctx.CompileString(`
parameter: [string]:string
`)

	formatting, err := format.Node(v.Syntax(cue.ResolveReferences(true)))
	fmt.Println(string(formatting))
}

What did you expect to see?

{
        parameter: [string]: string
}

What did you see instead?

{
        parameter: {}
}

I think this issue is related to #1609 , still a problem with schema and data.

FogDong avatar Jul 25 '22 07:07 FogDong

ResolveReferences is implemented by switching to data mode, which would be causing this.

You can try to not use ResolveReferences with tip, which includes the new self-containment logic.

mpvl avatar Jul 29 '22 06:07 mpvl

ResolveReferences is implemented by switching to data mode, which would be causing this.

You can try to not use ResolveReferences with tip, which includes the new self-containment logic.

Thanks for the answer.

With the latest code, the reference with schema works fine when the parameters come from another package, but if the parameter is from the same package, the reference won't work.

Here's an example: https://gist.github.com/FogDong/eddbd24115de8fb536719a08859f7492

FogDong avatar Jul 29 '22 07:07 FogDong

I get import "mod.com/p" instead of import "mod.com". But that aside.

It is correct it will not resolve references within the exported tree within schema mode, because otherwise you are changing the meaning of the schema. The result is self-contained, though. See the discussion in #867.

mpvl avatar Jul 29 '22 07:07 mpvl

Aside from it not being correct, in general, it is not possible (when there are cycles), or may alternatively open you up for security concerns. (billion laughs attack), if you resolve references indiscriminately.

What is the reason why you would like to resolve "internal" references?

What are you willing to give up when resolving references? For instance:

  • security is not a concern
  • constraints can be partially dropped
  • cyclic (but structurally sound) references can be disallowed

@FogDong/ @wonderflow: Or put it more concretely, to narrow down your use case: how would you want to see the following cases see resolved?:

Case 1: expressions

a: int
b: int
c: a + b

Case 2: cyclic

list: {
    value: _
    next: list | null
}

Case 3: billion laughs attack vector

a: ["lol","lol","lol","lol","lol","lol","lol","lol","lol"]
b: [a, a, a, a, a, a, a, a]
c: [b, b, b, b, b, b, b, b]
d: [c, c, c, c, c, c, c, c]
e: [d, d, d, d, d, d, d, d]
f: [e, e, e, e, e, e, e, e]
g: [f, f, f, f, f, f, f, f]
h: [g, g, g, g, g, g, g, g]
i: [h, h, h, h, h, h, h, h]

Note that in the latter case, evaluation would currently blow up just like it would for YAML. But it is possible in CUE to evaluate CUE in a way that this does not blow up and plan to address this with the structure sharing optimization. So it is important to think ahead and also prevent this kind of attack on other levels of CUE interpretation.

Note that all the above examples are self-contained, even though they do not resolve references.

mpvl avatar Jul 29 '22 11:07 mpvl

Aside from it not being correct, in general, it is not possible (when there are cycles), or may alternatively open you up for security concerns. (billion laughs attack), if you resolve references indiscriminately.

What is the reason why you would like to resolve "internal" references?

What are you willing to give up when resolving references? For instance:

  • security is not a concern
  • constraints can be partially dropped
  • cyclic (but structurally sound) references can be disallowed

@FogDong/ @wonderflow: Or put it more concretely, to narrow down your use case: how would you want to see the following cases see resolved?:

Case 1: expressions

a: int
b: int
c: a + b

Case 2: cyclic

list: {
    value: _
    next: list | null
}

Case 3: billion laughs attack vector

a: ["lol","lol","lol","lol","lol","lol","lol","lol","lol"]
b: [a, a, a, a, a, a, a, a]
c: [b, b, b, b, b, b, b, b]
d: [c, c, c, c, c, c, c, c]
e: [d, d, d, d, d, d, d, d]
f: [e, e, e, e, e, e, e, e]
g: [f, f, f, f, f, f, f, f]
h: [g, g, g, g, g, g, g, g]
i: [h, h, h, h, h, h, h, h]

Note that in the latter case, evaluation would currently blow up just like it would for YAML. But it is possible in CUE to evaluate CUE in a way that this does not blow up and plan to address this with the structure sharing optimization. So it is important to think ahead and also prevent this kind of attack on other levels of CUE interpretation.

Note that all the above examples are self-contained, even though they do not resolve references.

Thanks for the explanation, after talking with Paul earlier, we learned that we should use ResolveReference sparingly, in fact, it's better to deprecate the use of ResolveReference in our code.

We used to use ResolveReference to convert the cue values to string, and later in our code, convert the string to cue value the second time. In the earlier version of cue like v0.2.2, the use of ResolveReference can allow users to do this, but as cue updates, it becomes more designed and strict, and the old way of using ResolveReference can cause many problems as you commented, so it is now designed to convert to the data mode, not the schema mode. The original use in our code is not cue-native enough, I'll try to use InlineImports to refactor our code.

FogDong avatar Jul 29 '22 14:07 FogDong

Just to follow-up here. @FogDong and I had a call earlier where we discussed this in the context of the following example:

go mod tidy
go run .
cmp stdout stdout.golden

-- go.mod --
module mod.com

go 1.18

require cuelang.org/go v0.4.4-0.20220729051708-0a46a1624353

require (
	github.com/cockroachdb/apd/v2 v2.0.1 // indirect
	github.com/google/uuid v1.2.0 // indirect
	github.com/mpvl/unique v0.0.0-20150818121801-cbe035fff7de // indirect
	github.com/pkg/errors v0.8.1 // indirect
	golang.org/x/net v0.0.0-20220425223048-2871e0cb64e4 // indirect
	golang.org/x/text v0.3.7 // indirect
	gopkg.in/yaml.v3 v3.0.1 // indirect
)
-- main.go --
package main

import (
	"fmt"
	"log"

	"cuelang.org/go/cue"
	"cuelang.org/go/cue/cuecontext"
	"cuelang.org/go/cue/format"
)

func main() {
	ctx := cuecontext.New()

	selfContain := func(v cue.Value) (v2 cue.Value, err error) {
		n := v.Syntax(cue.InlineImports(true))
		b, err := format.Node(n)
		if err != nil {
			return v2, err
		}
		v2 = ctx.CompileBytes(b)
		return v2, nil
	}
	v1 := ctx.CompileString(`
		#aa: string
		d: {
			a: string
			b? : #aa
		}
	`)
	v2 := ctx.CompileString(`
		b: "test"
		c: "another test"
	`)
	v3, err := selfContain(v1.LookupPath(cue.ParsePath("d")))
	fmt.Println("== d ==")
	fmt.Printf("%#v\n", v3)
	if err != nil {
		log.Fatal(err)
	}
	x := v3.Unify(v2)
	fmt.Println("== x ==")
	fmt.Printf("%v\n", x)
}
-- stdout.golden --
== d ==
let AA = {
	#x: string
}
a:  string
b?: AA.#x
== x ==
{
	a: string
	b: "test"
	c: "another test"
}

This example shows how via the new cue.InlineImports() option the value d can be made self-contained. i.e. the reference to #aa outside of d in the source input is "resolved" and inlined by the Syntax() step. You can see this in the output value of d, which has a reference to the inlined let.

Incidentally, @mpvl, I'm now less sure on the name InlineImports() (and the docs that go with it). The example above does not use imports at all, and yet the option is responsible for self-containing the value d. So I think we perhaps need to rename and expand the docs?

myitcv avatar Jul 29 '22 16:07 myitcv

Incidentally, @mpvl, I'm now less sure on the name InlineImports() (and the docs that go with it). The example above does not use imports at all, and yet the option is responsible for self-containing the value d. So I think we perhaps need to rename and expand the docs?

Noting for others following along here, my comment is actually invalid. The "self contained" changes that landed in https://github.com/cue-lang/cue/commit/c71f6745235d5d2dc48ad044d93efc50485fdd7a are now the default behaviour. i.e. cue.Value.Syntax() always self-contains the value. The additional option cue.InlineImports() also does just that - inline imports.

myitcv avatar Aug 02 '22 14:08 myitcv

Per our discussions in various other issues, I think we have identified a better approach via the cue.Value API so I will mark this as closed. Please comment if this needs to remain open with details of outstanding issues. Thanks

myitcv avatar Oct 06 '22 07:10 myitcv