cue icon indicating copy to clipboard operation
cue copied to clipboard

cue: syntax converted from value lost close function

Open leejanee opened this issue 2 years ago • 21 comments

What version of CUE are you using (cue version)?

$ cue version
v0.4.3-beta.1

Does this issue reproduce with the latest release?

ctx := cuecontext.New()
v := ctx.CompileString(`
foo: close({
   value: 100
})
`)

//formats node in canonical cue fmt style
formatting,err:=format.Node(v.Syntax(cue.ResolveReferences(true)))
fmt.Println(string(formatting))

Execute the above code, the result is

{
	foo: {
		value: 100
	}
} <nil>

What did you do?

What did you expect to see?

The expected result should be

{
	foo: close({
		value: 100
	})
} <nil>

This is a bug of Syntax() with option ResolveReferences?

leejanee avatar Mar 30 '22 07:03 leejanee

There are two principle ways to print the syntax of values: as schema or as data. In data mode close is dropped.

Judging from the code the use of ResolveReferences triggers data mode. Arguably that should be independent, indeed. It appears as if this approach was taken as an approximation of the meaning of ResolveReferences, but I would agree that this is not ideal.

Feel free to turn this into an issue if you need this functionality.

mpvl avatar Mar 31 '22 14:03 mpvl

Yes, I need this functionality. Actually we expect that Syntax with option ResolveReferences is Symmetric,the result of Syntax can be recompiled to be a cue.Value consistent with the original.

leejanee avatar Apr 01 '22 09:04 leejanee

The use of ResolveReferences will not generate something that is identical to the original in the general case. For instance, consider this counterexample

a: string
b: a

is not the same as

a: string
b: string

The former constraints b to be equal to a whereas the latter does not. So resolving references does not always result into something roundtripable. This is part of the reason why not much of an effort was made to preserve certain properties here, as generally you may as well go to data mode.

What I could imagine, though, is to allow something like SelfContained, meaning that it would substitute only references to package values or, in general, references to something outside of the Value. That way you would "symmetric" behavior going to and from the value.

mpvl avatar Apr 01 '22 11:04 mpvl

@mpvl Thank you very much for your explanation ,and SelfContained can present my demand better than Symmetric.

It seems that the closed is not feature of structLit in data mode .

The type in the cue is gradually narrowed, should the features that can narrow type range belong to data mode? and closed should be part of the type structLit ? similar to the string like the following example

t: string

to

t: "abc"

leejanee avatar Apr 02 '22 05:04 leejanee

The type in the cue is gradually narrowed, should the features that can narrow type range belong to data mode?

That is a great question. It was a deliberate choice to make it not part of it, but I would be open to hear arguments to the contrary. I'll explain below.

The general idea is that the only two general modes that make sense are:

  1. data mode, where everything must be concrete, and
  2. def mode, where all constraints are preserved and configurations only get increasingly more specific.

Anything in between (e.g. data when used for non-concrete values), is semantically dubious.

So assuming that everything is concrete in data mode, the only thing left to consider to include in data mode is whether structs are closed or not. To understand why we do not include it, one has to consider the purpose of closedness.

In general, proper APIs should be open to extension. In such a world, there is no room for closed structs. However, this opens up configuration writers to having typos in field names to remain uncaught. This was a problem in GCL, for instance. Closing structs allows such typos to be caught.

CUE takes an approach that aims to satisfy both desirable interpretations at once. This is analogous to the Protobuf semantics: an incoming message is allowed to have unknown fields, which are just ignored. But when looking up an unknown field in an unmarshaled message in a typed language, like Go, it will result in a compiler error. In other words, incoming messages are allowed to have unknown fields, whereas using fields in code that are unknown for the proto definition that was used to compile the type are not.

CUE works a bit the same way: within a single configuration, for schema, closedness is observed. However, for data mode it is not. This is the reason closed is dropped in data mode. Even CUE's semantics internally relies on the values not being closed in principle. There are a few cases where closedness leaks into semantics (like with disjunctions), and this is indeed irksome to me. :) But it is mostly intended to catch typos, really.

The problem with ResolveReferences is that it is a bit of a silly option. In schema/def mode, it can result in "widening" of configurations (as outlined in earlier). So it is clearly a wrong option there. But for data it makes little sense for now, as all references are always resolved in data mode.

In the future, though, ResolveReferences may make sense for data mode: it may make sense to allow references in data (graph mode) to avoid exponential blowup. (See https://en.wikipedia.org/wiki/Billion_laughs_attack.).

What seems to make sense for you though, it seems, is to allow for a SelfContained mode that resolves references only references outside of the current value, but does not resolve references that refer to values that are part of the value generated by Syntax.

Does that make sense?

mpvl avatar Apr 04 '22 11:04 mpvl

What seems to make sense for you though, it seems, is to allow for a SelfContained mode that resolves references only references outside of the current value, but does not resolve references that refer to values that are part of the value generated by Syntax.

It is great,and expect more details about SelfContained .

leejanee avatar Apr 06 '22 03:04 leejanee

If I understood clearly, this means we will introduce a SelfContained mode, in this mode, all references within the current context will be evaluated and only references that have outside variables can be left. I think this mod can work for our use case, am I right? @leejanee

wonderflow avatar Apr 06 '22 03:04 wonderflow

@wonderflow I would say it is the other way around. By example:

Assume the following is represented by root

import "x/y"

a: {
    b: y.S // has the value string
    c: b
    d: foo
}
foo: string

then

v := root.LookupPath(cue.ParsePath("a")).Syntax(SelfContained(true))

would result in

b: string
c: b
d: string

So references that points to fields that are represented by the return value of Syntax are contained, whereas any references that are not are expanded.

This ensures the semantics of the value is fully retained, insofar possible for isolated values.

The following would be allowed to be generated instead:

_imported: y: S: string
_config: foo: string
b: _imported.y.S
c: b
d: _config.foo

this could avoid repeating definitions, which may be bulky, and can even avoid billion-laughs like issues.

mpvl avatar Apr 06 '22 10:04 mpvl

Note that SelfContained is a working name. We could reuse ResolveReferences to mean this. But it seems that the term SelfContained is a more appropriate name anyway, as it does not, in fact, end up resolving all references.

mpvl avatar Apr 06 '22 10:04 mpvl

@mpvl following your example

import "x/y"

a: {
    b: y.S // has the value string
    c: b
    d: foo
}
foo: string

I think the case 1 ( result as below) meets the SelfContained.

b: string
c: b
d: string

In fact, In case 2 new fields _imported and _config are introduced, this makes the structure different. For example

import "z/y"

a: {
    b1: y.S // has the value int
}

Both a in the two examples has completely different structure. After SelfContained , they will have the same fields _imported: y: S: and the type conflict.

Whether only to consider retaining the reference that cannot be resolved in case 1, as follow

ref: { 
  y: string
}

a: {
   b: ref.x
   c: ref.y
}

result in

   ref: { }

   b: ref.x
   c: string

leejanee avatar Apr 10 '22 10:04 leejanee

Hi @mpvl , are there any conclusion on this issue, actually we're pending on this for upgrade.

wonderflow avatar Apr 20 '22 05:04 wonderflow

Hi @wonderflow - I think it's worth jumping on a quick call to catch up on this issue, make sure we're clear on the context. I just DM-ed you to arrange a time.

myitcv avatar May 20 '22 07:05 myitcv

@wonderflow, @leejanee, @FogDong - ahead of talking this through, we would like to understand more context.

Our understanding of your setup is taken mainly from https://github.com/cue-lang/cue/issues/867#issuecomment-975071952. That comment describes how multiple users cooperate to complete a resource definition. Each user authors and evaluates CUE locally (you refer to this as "user-side rendering"). The result of each user's evaluation is then transmitted to a server in ast.Node format. The server then unifies the result and does something with it.

Just to be clear, by "ast.Node format" we understand (from https://github.com/cue-lang/cue/issues/867#issue-936226878) that you convert a cue.Value into syntax via cue.Value.Syntax, passing the cue.ResolveReferences(true) option, and then use cue/format.Node to encode to a []byte value.

In particular we would like to understand the motivation/reasons behind using this approach, vs. something like the following example that does not attempt to use cue.ResolveReferences and instead encodes all the client context.

The following example is a Go module that comprises two packages that are main packages.

The user program simulates the role of a user as described in https://github.com/cue-lang/cue/issues/867#issuecomment-975071952. The program takes a list of directory arguments. Each directory argument represents a user. The directory must declare a CUE module with a package at the root of the module. The evaluation of that root package represents that user's contribution to the configuration that will ultimately be evaluated by the server. If evaluation of that package does not produce any errors, a txtar archive of the configuration is created. This txtar archive is then transmitted to the server to be composed with other user's contributions.

The server program simulates the role of the server as described in https://github.com/cue-lang/cue/issues/867#issuecomment-975071952. The program takes a list of filename arguments. Each file must be a txtar archive that results from a user simulation (see the user package). The server program constructs an overlay that combines each user archive side-by-side, in a dummy CUE module call mod.com/server. It then evaluates the package at the root of mod.com/server, which in effect is the same as unifying each of the packages values at the roots of the respective user modules.

Clearly there are a number of simplifications that have been made here by using command line programs, but hopefully the essence of the scenario is retained.

The inputs to each of the user simulations matches the inputs in https://github.com/cue-lang/cue/issues/867#issuecomment-975071952.

Note we are using txtar as the encoding between user and server to demonstrate in a human-readable format what is exchanged. Alternatively, we could look to provide an equivalent of the currently deprecated cue.Runtime's Marshal and Unmarshal methods. However, the use of txtar is intentional because it serves to help highlight that dependency version skew might exist between different users unless the KubeVela environment enforces a constrain/context that prevents that - we should ensure we discuss this point.

Using this approach, references to undefined fields are retained per your comment. Furthermore, we don't need to resolve references to imports, because the imports are encoded as part of the user payload sent to the server.

Here is a link to a testscript repro of the example in full:

https://gist.github.com/myitcv/20df782d9a8ff1535ad050df64311a9c

We first simulate two users evaluating CUE in their own local contexts, the creation of the txtar payloads to be "sent" to the server, then the running of the server simulation that consumes those txtar payloads, printing the resulting CUE.

As you can see, the result of the server evaluation is:

parameter: {
	image: (*"myimage" | string) & {
		string
	}
	replicas: (*2 | >=1 & <5) & {
		3
	}
}
resource: v1.#Deployment & {
	spec: {
		replicas: parameter.replicas & parameter.replicas
		template: {
			spec: {
				containers: [{
					image: parameter.image
					name:  "main"
					envs: [...]
				}]
			}
		}
	}
}

This matches the expectation in https://github.com/cue-lang/cue/issues/867#issuecomment-975071952.

Please can you have a look through this example ahead of our call? We look forward to speaking and understanding more about the context/challenges you have been facing.

cc @mvdan who helped with this analysis.

myitcv avatar May 22 '22 08:05 myitcv

@myitcv thank you and let me explain our scenario. As the example of #867 in our case,the process contains two stages.

  • Stage 1: Execute the cue template on user side with user input as parameter and send the result to the server. The parameter is user-oriented and will have different structures(validate user input) for different user roles. For example as follow

----user1 ---

#EnvInput: {
	env: "prod" | "dev"
}

parameter: #EnvInput

resource:  {
	spec: {
                 replicas: *1 
                 if parameter.env == "prod" {
                       replicas: 3
                 }

                  if parameter.env == "dev" {
                       replicas: 2
                 }
	}
}

// if `env` equal "dev", the output is
{
   kind: Deployment
   apiVersion: apps/v1
   spec: replicas: 2
}

----user2 ---

#Workload: *"Deployment" | "ReplicaSet"

parameter: #Workload

resource:  {
	if parameter=="Deployment" {
                 v1.#Deployment
        }
        if parameter=="ReplicaSet" {
                 v1.#ReplicaSet
        }
}

// if parameter equal `Deployment`, output is
{
   kind: Deployment
   apiVersion: apps/v1
}

  • Stage 2: Process all cue code submitted from client side on server side.
{
    kind: Deployment
    apiVersion: apps/v1
} & {
     spec: replicas: 2
}

leejanee avatar May 23 '22 05:05 leejanee

Thanks @leejanee. That appears to confirm our understanding of your setup.

In the example I linked to, user input is simulated via an input.cue file for each user.

Ahead of our call, please can you look through the example and confirm whether or not it meets your needs?

myitcv avatar May 24 '22 07:05 myitcv

@myitcv Thank you for your example. And in user side, should compile input.cue and template.cue as follow

     ctx := cuecontext.New()
    // compile input.cue and template.cue
     v := ctx.CompileString(`
...`)
   //formats node in canonical cue fmt style
   formatting,_ := format.Node(v.LookUp("resource").Syntax(cue.ResolveReferences(true)))

and send the formatting to the server.

leejanee avatar May 24 '22 10:05 leejanee

@leejanee thanks. My example deliberately does not do that, because I'm trying to better understand why sending the result of format.Node (with ResolveReferences) is critical to your solution.

My example does something else: after ensuring that things validate cleanly, it "bundles" the entire user side context, and sends it to the server for evaluation. The result will be the same (indeed the approach I suggest is more likely to be correct because of some of the issues you've uncovered with ResolveReferences).

Please can you help me understand why the approach you describe in https://github.com/cue-lang/cue/issues/1609#issuecomment-1135732053 is critical to your solution?

myitcv avatar May 24 '22 11:05 myitcv

@myitcv In your example, if user1 and user2 use packages with the same path but different contents, will it cause conflicts? for example as follow ---user1---

// input.cue
import "mypackge.io/input"
parameter: {
  env: input.#env
}
// cue.mod/mypackge.io/input/kind.cue
#env: "prod" | "dev"

---user2---

// input.cue
import "mypackge.io/input"
parameter: input.#workload

// cue.mod/mypackge.io/input/kind.cue
#workload: *"Deployment" | "ReplicaSet"

leejanee avatar May 25 '22 02:05 leejanee

if user1 and user2 use packages with the same path but different contents, will it cause conflicts? for example as follow

In my example, every package from user 1 (including dependencies) is unified with the same package (determined by package path) from user 2, and vice versa. So in the case of your example, the package at path mypackge.io/input would end up evaluating as follows:

#env:      "prod" | "dev"
#workload: *"Deployment" | "ReplicaSet"

The results are not combined on a file-by-file basis, but unified at the package level.

If the server side unification results in conflicts, i.e. packages at the same package path are not compatible, then an error would occur during evaluation on the server.

I suspect you handle such cases for user input.cue already (unless there are other reasons why that situation cannot occur).

In case of conflicts in dependencies, I would be keen to understand how that situation can come about: that two users can be depending on different versions of the same package that are not compatible. In any case, I suspect a format.Node and ResolveReferences solution would also fail in case of conflicts in dependencies.

myitcv avatar May 25 '22 20:05 myitcv

Just FYI, I'm looking into a design to implement SelfContained now. Not trivial, but I'll giving it a serious whirl.

mpvl avatar Jun 22 '22 09:06 mpvl

This now appears to be fixed following the changes to support self-contained in cue.Value.Syntax():

go mod tidy
go run .
cmp stdout stdout.golden

-- go.mod --
module mod.com

go 1.18

require cuelang.org/go v0.4.4-0.20220808083940-b4d1b16ccf89

-- main.go --
package main

import (
	"fmt"

	"cuelang.org/go/cue/cuecontext"
	"cuelang.org/go/cue/format"
)

func main() {
	ctx := cuecontext.New()
	v := ctx.CompileString(`
foo: close({
   value: 100
})
`)
	formatting, err := format.Node(v.Syntax())
	if err != nil {
		panic(err)
	}
	fmt.Println(string(formatting))
}
-- stdout.golden --
{
	foo: close({
		value: 100
	})
}

Note the lack of cue.ResolveReferences(true) option: following the discussion in https://github.com/cue-lang/cue/issues/1813, we need to better understand what is actually required if you still find yourself needing that option.

myitcv avatar Aug 08 '22 16:08 myitcv

Per our discussions in various other issues, I think we have identified a better approach via the cue.Value API so I will mark this as closed. Please comment if this needs to remain open with details of outstanding issues. Thanks

myitcv avatar Oct 06 '22 08:10 myitcv