cue icon indicating copy to clipboard operation
cue copied to clipboard

hashing struct values to represent equality is not straightforward

Open mxey opened this issue 6 months ago • 29 comments

What version of CUE are you using (cue version)?

$ cue version
cue version v0.13.1

go version go1.24.4
      -buildmode exe
       -compiler gc
       -trimpath true
     CGO_ENABLED 0
          GOARCH arm64
            GOOS darwin
         GOARM64 v8.0
cue.lang.version v0.13.0

Does this issue reproduce with the latest stable release?

v0.13.1 is the latest stable

What did you do?

We generate our K8s ConfigMap names based on their data. We then refer to the metadata.name field to get the right generated name.

With evalv3, cue cmd build generates YAML output where the two fields do not match, despite being a straight reference. Curiously, this only happens in cue cmd. cue export works fine.

This does not happen with evalv2. The generated name is different with evalv2 from evalv3, but it's identical in all the fields.

It also seemed to me that having multiple CUE files made a difference here.

testscript:

exec cue version

env CUE_EXPERIMENT=evalv3=0
exec cue export --out=text -e app.objects.ConfigMap.app.metadata.name
stdout app-72a88d09e6
exec cue export --out=text -e app.objects.Deployment.app.spec.template.spec.containers[0].envFrom[0].configMapRef.name
stdout app-72a88d09e6
exec cue cmd build
grep -count=2 app-72a88d09e6 output.yaml

env CUE_EXPERIMENT=evalv3=1
exec cue export --out=text -e app.objects.ConfigMap.app.metadata.name
stdout app-843db6c1f5
exec cue export --out=text -e app.objects.Deployment.app.spec.template.spec.containers[0].envFrom[0].configMapRef.name
stdout app-843db6c1f5
exec cue cmd build
grep -count=2 app-843db6c1f5 output.yaml

-- build_tool.cue --
package foo

import (
	"encoding/yaml"
	"tool/file"
)

command: build: task: {
	build: file.Create & {
		$after:   task.prepare
		filename: "output.yaml"
		contents: yaml.MarshalStream(app.objectList)
	}
}


-- instance_staging.cue --
package foo

app: {

	objects: ConfigMap: app: data: {
		KEY_JWT_ACCESS_SIGN: "foo"
	}

	objects: ConfigMap: app: data: SIGNING_KEY: "foobar"
}
-- kubernetes.cue --
package foo

import (
	"crypto/sha256"
	"strings"
	"encoding/json"
	"encoding/hex"
)

#ConfigMap: {
	apiVersion: "v1"
	kind:       "ConfigMap"

	data: [string]: string
	immutable: *true | false

	metadata: {
		#basename: string
		if immutable {
			let hash = hex.Encode(sha256.Sum256(json.Marshal([data]))[0:5])
			name: "\(#basename)-\(hash)"
		}
		if !immutable {
			name: #basename
		}
	}
}

app: {
	objectList: [
		for kind, objs in objects for name, obj in objs {
			obj
		},
	]

	objects: [KIND=string]: [NAME=string]: {
		kind: KIND
		metadata: {
			name: *NAME | strings.HasPrefix(NAME+"-")
		}
	}

	objects: ConfigMap: [NAME=string]: #ConfigMap & {metadata: #basename: NAME}

	objects: Deployment: app: spec: template: spec: containers: [{
		envFrom: [
			{configMapRef: name: objects.ConfigMap["app"].metadata.name},
		]
	}, ...]
	objects: ConfigMap: app: {
		data: {
			SIGNING_KEY!:         string
			KEY_JWT_ACCESS_SIGN!: string
		}
	}

}

What did you expect to see?

PASS

What did you see instead?

> exec cue version
[stdout]
cue version v0.13.1

go version go1.24.4
      -buildmode exe
       -compiler gc
       -trimpath true
     CGO_ENABLED 0
          GOARCH arm64
            GOOS darwin
         GOARM64 v8.0
cue.lang.version v0.13.0
> env CUE_EXPERIMENT=evalv3=0
> exec cue export --out=text -e app.objects.ConfigMap.app.metadata.name
[stdout]
app-72a88d09e6
> stdout app-72a88d09e6
> exec cue export --out=text -e app.objects.Deployment.app.spec.template.spec.containers[0].envFrom[0].configMapRef.name
[stdout]
app-72a88d09e6
> stdout app-72a88d09e6
> exec cue cmd build
> grep -count=2 app-72a88d09e6 output.yaml
> env CUE_EXPERIMENT=evalv3=1
> exec cue export --out=text -e app.objects.ConfigMap.app.metadata.name
[stdout]
app-843db6c1f5
> stdout app-843db6c1f5
> exec cue export --out=text -e app.objects.Deployment.app.spec.template.spec.containers[0].envFrom[0].configMapRef.name
[stdout]
app-843db6c1f5
> stdout app-843db6c1f5
> exec cue cmd build
> grep -count=2 app-843db6c1f5 output.yaml
[output.yaml]
apiVersion: v1
kind: ConfigMap
data:
  SIGNING_KEY: foobar
  KEY_JWT_ACCESS_SIGN: foo
immutable: true
metadata:
  name: app-72a88d09e6
---
kind: Deployment
metadata:
  name: app
spec:
  template:
    spec:
      containers:
        - envFrom:
            - configMapRef:
                name: app-843db6c1f5

FAIL: <stdin>:17: have 1 matches for `app-843db6c1f5`, want 2
failed run

mxey avatar Jun 13 '25 13:06 mxey

testscript that is less clear but not dependent on a specific hash:

exec cue version

env CUE_EXPERIMENT=evalv3=0
exec cue export --out=text -e app.objects.ConfigMap.app.metadata.name
cp stdout export-def
exec cue export --out=text -e app.objects.Deployment.app.spec.template.spec.containers[0].envFrom[0].configMapRef.name
cmp stdout export-def

exec cue cmd print-def
cmp stdout export-def

exec cue cmd print-use
cmp stdout export-def

env CUE_EXPERIMENT=evalv3=1
exec cue export --out=text -e app.objects.ConfigMap.app.metadata.name
cp stdout export-def
exec cue export --out=text -e app.objects.Deployment.app.spec.template.spec.containers[0].envFrom[0].configMapRef.name
cmp stdout export-def

exec cue cmd print-def
cmp stdout export-def

exec cue cmd print-use
cmp stdout export-def

-- build_tool.cue --
package foo

import (
	"tool/cli"
)

command: "print-def": task: {
	print: cli.Print & {
		text: app.objectList[0].metadata.name
	}
}

command: "print-use": task: {
	print: cli.Print & {
		text: app.objectList[1].spec.template.spec.containers[0].envFrom[0].configMapRef.name
	}
}


-- instance_staging.cue --
package foo

app: {

	objects: ConfigMap: app: data: {
		KEY_JWT_ACCESS_SIGN: "foo"
	}

	objects: ConfigMap: app: data: SIGNING_KEY: "foobar"
}
-- kubernetes.cue --
package foo

import (
	"crypto/sha256"
	"strings"
	"encoding/json"
	"encoding/hex"
)

#ConfigMap: {
	apiVersion: "v1"
	kind:       "ConfigMap"

	data: [string]: string
	immutable: *true | false

	metadata: {
		#basename: string
		if immutable {
			let hash = hex.Encode(sha256.Sum256(json.Marshal([data]))[0:5])
			name: "\(#basename)-\(hash)"
		}
		if !immutable {
			name: #basename
		}
	}
}

app: {
	objectList: [
		for kind, objs in objects for name, obj in objs {
			obj
		},
	]

	objects: [KIND=string]: [NAME=string]: {
		kind: KIND
		metadata: {
			name: *NAME | strings.HasPrefix(NAME+"-")
		}
	}

	objects: ConfigMap: [NAME=string]: #ConfigMap & {metadata: #basename: NAME}

	objects: Deployment: app: spec: template: spec: containers: [{
		envFrom: [
			{configMapRef: name: objects.ConfigMap["app"].metadata.name},
		]
	}, ...]
	objects: ConfigMap: app: {
		data: {
			SIGNING_KEY!:         string
			KEY_JWT_ACCESS_SIGN!: string
		}
	}

}

mxey avatar Jun 13 '25 14:06 mxey

bisected to f8defbb0ddc57b3c895f571ff7d77ed2b8e6ea69

mxey avatar Jun 13 '25 14:06 mxey

I can only reproduce it if both evalv3 and toposort are enabled.

mxey avatar Jun 13 '25 14:06 mxey

The problem arises from the json.Marshal call - the fields in the JSON do not have the same order in the different runs.

This can be seen clearly when changing let hash to only be the JSON, no hashing:

> exec cue export --out=text -e app.objects.ConfigMap.app.metadata.name
[stdout]
app-[{"KEY_JWT_ACCESS_SIGN":"foo","SIGNING_KEY":"foobar"}]
> cp stdout export-def
> exec cue export --out=text -e app.objects.Deployment.app.spec.template.spec.containers[0].envFrom[0].configMapRef.name
[stdout]
app-[{"KEY_JWT_ACCESS_SIGN":"foo","SIGNING_KEY":"foobar"}]
> cmp stdout export-def
> exec cue cmd print-def
[stdout]
app-[{"SIGNING_KEY":"foobar","KEY_JWT_ACCESS_SIGN":"foo"}]
> cmp stdout export-def
diff stdout export-def
--- stdout
+++ export-def
@@ -1,1 +1,1 @@
-app-[{"SIGNING_KEY":"foobar","KEY_JWT_ACCESS_SIGN":"foo"}]
+app-[{"KEY_JWT_ACCESS_SIGN":"foo","SIGNING_KEY":"foobar"}]

mxey avatar Jun 13 '25 14:06 mxey

We've discussed this for a bit in Slack. The issue here is that you're JSON marshaling some values, and then hashing those JSON bytes. Because toposort does not necessarily keep a consistent ordering between values, you can easily end up hashing {"one": 1,"two": 2} and later {"two": 2,"one":1}, leading to different hashes, even though the values are equal.

Ultimately, what you're trying to do here is to hash CUE values and have them result in the same hash if, and only if, the values are equal. That's not exactly easy to do today in CUE in general. Below are some pointers, which we discussed:

  • Given that you want to hash a flat struct of {[string]: string}, use a bit of CUE to flatten that into a list of key-value pairs, sort that list, and then JSON marshal and hash that list.
  • https://github.com/cue-lang/cue/issues/1909 could have caught ordering bugs like this one.
  • If we had something like json.MarshalDeterministic (https://github.com/cue-lang/cue/issues/3595), like https://pkg.go.dev/github.com/go-json-experiment/json#Deterministic, then you could use that directly on CUE values without doing the sorting yourself.
  • An alternative would be for the CUE standard library to provide out of the box hashing of structs and lists, such that structs would be sorted lexicographically first.

mvdan avatar Jun 13 '25 15:06 mvdan

First stab at a workaround:

_hasher: {
	fields: [string]: string
	l: list.SortStrings([for k, v in fields {"\(strconv.Quote(k))=\(strconv.Quote(v))"}])
	s:    strings.Join(l, ",")
	hash: hex.Encode(sha256.Sum256(s)[0:5])
}

However, I just realized the problem will continue one level down if one of the ConfigMap values was a json.Marshall, right?

ConfigMap: foo: data: {
	foo: json.Marshal({…})
}

mxey avatar Jun 13 '25 15:06 mxey

@mvdan On further thought, I think that the very same struct in the same CUE run should always serialize in the same way. The problem isn’t that it might change between CUE versions or from code changes, but that in a single run it was different. And specifically only when using cue cmd, not with export. That seems wrong to me.

As I understand toposort, it should not change behavior like that?

mxey avatar Jun 13 '25 19:06 mxey

You asked the same question in https://github.com/cue-lang/cue/issues/3558, so I've answered there.

mvdan avatar Jun 13 '25 20:06 mvdan

Apologies for the length of this post, but I wanted to be as clear as possible.

Assumptions

I want to first explain my mental model of CUE evaluation. This issue violated the mental model which made it downright scary.

Given this CUE code

a: 5 + 5
b: a

I think of it being evaluated in steps like this:

a: 10
b: a

a: 10
b: 10

This aligns with what the CUE tour says about references: (emphasis mine)

A reference refers to the value of the field defined within the nearest enclosing scope.

However, what actually happens during evaluation is more like this:

a: 5 + 5
b: 5 + 5

a: 10
b: 10

That is also what the CUE spec says (emphasis mine):

An identifier operand refers to a field and is called a reference. The value of a reference is a copy of the expression associated with the field that it is bound to, with any references within that expression bound to the respective copies of the fields they were originally bound to. Implementations may use a different mechanism to evaluate as long as these semantics are maintained.

Of course this distinction makes no difference in output, as long as all the expressions are idempotent (if that is the correct term?), which I assumed all CUE expressions to be, meaning they will give the same result every time.

Use case

In our Kubernetes configuration written in CUE, we do something that we have brought over from Kustomize: ConfigMaps (and Secrets) get an appendix to their name based on their contents.

Then in the PodSpec, we use that generated name to refer to the ConfigMap.

import (
  "encoding/hex"
  "crypto/sha256"
  "encoding/json"
)

objects: ConfigMap: foo: {
	let hash = hex.Encode(sha256.Sum256(json.Marshal(data))[0:5])
	metadata: name: "foo-\(hash)"
	data: {
		a: "1"
		b: "z"
	}
}

objects: Deployment: foo: spec: template: spec: containers: [
	{
		envFrom: [{configMapRef: name: objects.ConfigMap.foo.metadata.name}]
	},
]

The motivation is that a change in the configuration values will lead to a new ConfigMap being created and then the Deployment rolling out a new set of pods using that new ConfigMap.

The problem

The problem was that in one project, where we use cue cmd build to output the Kubernetes objects into a YAML file, metadata.name had a different value than the configMapRef.name in the PodSpec. You can see how this breaks the mental model I mentioned at the beginning. The end result is that when you deploy to Kubernetes, the pod refers to a ConfigMap that does not exist and fails to start.

The issue turned out to be that json.Marshal generated different output when called in metadata.name vs. configMapRef.name

Assumptions

As I discussed with @mvdan, structs in CUE do not have a predictable or fixed field order. However, as I understand the point of toposort, and with the way CUE references are documented in the tour, this behaviour still should not happen. The problem is not that with toposort enabled the order of the fields in the JSON is different than with toposort disabled; but that with it enabled, there can be two different orders in the same CUE “run”.

Investigation

I can only reproduce the issue with this combination:

This might not technically be a bug based on the spec, but I think it‘s still a massive footgun. I don‘t think that calling json.Marshal twice on the very same struct should return different results.

Reproduction

This reproduction is not minimal, but is very small and shows the problem in context:

https://cuelang.org/play/?id=iKXMQMozzPi#w=function&i=cue&f=eval&o=cue

import (
  "encoding/json"
)

o: cm: data: b: "2"
o: cm: data: a: "1"

o: cm: {
  j: json.Marshal(data)
  data: [string]: string
}

o: ref: o.cm.j

ol: [for x in o {x}]


All json.Marshal results in the output are the same, except for one of them:

o: {
	cm: {
		data: {
			b: "2"
			a: "1"
		}
		j: "{\"b\":\"2\",\"a\":\"1\"}"
	}
	ref: "{\"b\":\"2\",\"a\":\"1\"}"
}
ol: [{
	j: "{\"a\":\"1\",\"b\":\"2\"}"
	data: {
		a: "1"
		b: "2"
	}
}, "{\"b\":\"2\",\"a\":\"1\"}"]

When disabling toposort, it comes out consistently:

{
	"o": {
		"cm": {
			"j": "{\"b\":\"2\",\"a\":\"1\"}",
			"data": {
				"b": "2",
				"a": "1"
			}
		},
		"ref": "{\"b\":\"2\",\"a\":\"1\"}"
	},
	"ol": [
		{
			"j": "{\"b\":\"2\",\"a\":\"1\"}",
			"data": {
				"b": "2",
				"a": "1"
			}
		},
		"{\"b\":\"2\",\"a\":\"1\"}"
	]
}

For completeness, CUE_DEBUG=sortfields also makes it consistent:

{
	"o": {
		"cm": {
			"data": {
				"a": "1",
				"b": "2"
			},
			"j": "{\"a\":\"1\",\"b\":\"2\"}"
		},
		"ref": "{\"a\":\"1\",\"b\":\"2\"}"
	},
	"ol": [
		{
			"data": {
				"a": "1",
				"b": "2"
			},
			"j": "{\"a\":\"1\",\"b\":\"2\"}"
		},
		"{\"a\":\"1\",\"b\":\"2\"}"
	]
}

Minimal reproduction

This is a minimal reproduction that shows that field order becomes wonky once comprehensions are involved:

data: b: "2"
data: a: "1"

one: data
two: [for d in [data] {d}][0]
three: {if true { data }}
data: {
	b: "2"
	a: "1"
}
one: {
	b: "2"
	a: "1"
}
two: {
	a: "1"
	b: "2"
}
three: {
	a: "1"
	b: "2"
}

mxey avatar Jun 16 '25 07:06 mxey

Right, my current understand of this problem is that this might not have anything to do with comprehensions, and instead is about implicit vs explicit unification.

I define these terms as follows: "explicit unification" is where an & is involved, e.g. x & y. "Implicit unification" is everything else. Notably, when you do:

x: a: _
x: b: _

you are implicitly unifying {a: _} with {b: _}.

I believe it's currently the case that the evaluator does not always provide the toposort code with sufficient information to be able to correctly order fields when they come from implicit unification. I wrote some code comments about this at https://github.com/cue-lang/cue/blob/master/internal/core/toposort/vertex.go#L49-L75

// 4. Implicit unification
//
//	c: {z: _, y: _}
//	c: {x: _, w: _}
//
// Here, like with embeddings, we choose that the source order is
// important, and so we must have a minimum of (z -> y), (x -> w) and
// (y -> x).
//
// Currently, the evaluator does not always provide enough information
// for us to be able to reliably identify all implicit unifications,
// especially where the ordering is enforced via some intermediate
// node. For example:
//
//	a: {
//		d: z: _
//		d: t: _
//		e: {x: _, w: _}
//	}
//	c: a.d & a.e
//
// Here, the information we get when sorting the fields of c (post
// evaluation), is insufficient to be able to establish the edge (z ->
// t), but it is sufficient to establish (x -> w). So in this case, we
// end up only with the edge (x -> w), and so the other field names
// fall back to lexicographical sorting.

Unfortunately that's what's happening here. If you add a four: { data } then you'll see that reverts to lexicographical too - the fact that data has been added in via an embedding is sufficient to ensure that toposort does not receive enough information to realise that there should be an edge b -> a.

If you change your data to:

data: {
	b: "2"
	a: "1"
}

then everything is fine and consistent.

I totally agree this is a footgun - especially the inconsistency - that's really problematic.

cuematthew avatar Jun 16 '25 08:06 cuematthew

Unfortunately that's what's happening here. If you add a four: { data } then you'll see that reverts to lexicographical too - the fact that data has been added in via an embedding is sufficient to ensure that toposort does not receive enough information to realise that there should be an edge a -> b.

OK, that explains the situation in the minimal reproduction, but does it also explain the longer example? In that, data is only ever used in one way, yet there are different outputs in the end.

mxey avatar Jun 16 '25 08:06 mxey

@cuematthew Can you also comment on my description about my mental model of CUE? Am I just on the wrong track there?

mxey avatar Jun 16 '25 09:06 mxey

OK, that explains the situation in the minimal reproduction, but does it also explain the longer example? In that, data is only ever used in one way, yet there are different outputs in the end.

I think it does explain it, yes.

In your list comprehension, you essentially have [{o.cm}, {o.ref}], because the emitted values from comprehensions are always treated as embeddings. If you change your code to ol: [{o.cm}, {o.ref}] then you get the same (inconsistent) behaviour. Instead, if you change your code to ol: [o.cm, o.ref] then it's consistently b -> a which is what you want, but this shows it's the interaction of implicit unification and embeddings that's causing certain bits of information to be lost.

Separately, if you change to o: cm: data: { b: "2", a: "1" } then you consistently get the edge b -> a.

cuematthew avatar Jun 16 '25 09:06 cuematthew

@cuematthew But why is only one of the outputs in the comprehension different?

mxey avatar Jun 16 '25 09:06 mxey

But also, do I understand correctly that if I do json.Marshal({data}), I can make it behave consistently, because toposort information will always be lost? If so, that would be a feasible workaround, because unlike CUE_DEBUG=sortfields, I can put that in our shared package.

mxey avatar Jun 16 '25 09:06 mxey

@cuematthew Can you also comment on my description about my mental model of CUE? Am I just on the wrong track there?

I try not to think about the evaluation order / strategy. The spec leaves open the question of evaluation strategy. The currently implementation is, I believe, more eager than necessary, but that doesn't mean it's wrong or bad. I believe you could implement cue solely with a call-by-need ( https://en.wikipedia.org/wiki/Lazy_evaluation ) strategy.

The thing I do try to keep in mind is that you should never be able to observe an intermediate value. When you refer to something, you are always referring to the final value of that thing. In the evaluation that could mean all sorts of strategies, trying to reach fixed-points and so forth. I have some truly cursed examples if you're interested, such as:

r: L={L.a}
i: {a: a: a: a: a: x: 4}
out: r & i

The only thing you should be able to observe in out is the final value of out, which is the fixed-point reached by continuously unifying i with i.a. How the evaluator does that isn't particularly important - given the commutative properties of CUE, multiple different evaluators could take different paths through the evaluation graph, and should always produce the same result.

The thing that I find gets even more mind-melting is the current concept of "defaults" in disjunctions, and under what conditions defaults get "chosen". There is much internal discussion about this point currently.

cuematthew avatar Jun 16 '25 09:06 cuematthew

An explicit unification can apparently also be used to lose toposort information?

o: cm: data: b: "2"
o: cm: data: a: "1"

one: o.cm.data
two: o.cm.data & _
o: {
	cm: {
		data: {
			b: "2"
			a: "1"
		}
	}
}
one: {
	b: "2"
	a: "1"
}
two: {
	a: "1"
	b: "2"
}

mxey avatar Jun 16 '25 10:06 mxey

@cuematthew But why is only one of the outputs in the comprehension different?

Well quite.

I think I might need to add some further context here about the extent to which ordering of fields does or does not impact evaluation, and calculated values:

a little bit of context

There is an open question, for which we (the CUE team) don't all agree, about the dangers of comprehensions, in terms of ordering of fields. Specifically, when you use a comprehension to iterate over they key-value pairs of a struct, and convert that into a list.

If you ignore that sort of comprehension (and in this bucket, I also place things like serialisation of a struct, which, fundamentally is also converting a struct to a list (of bytes)), then ordering of fields never matters -- it never alters the semantics of CUE itself -- the CUE implementation is free to iterate over fields in any order it chooses, non-deterministically, and you'd still get the same result. In this world, ordering of fields is purely a presentational issue for outputted values. Everyone agrees than on output, it's important it's deterministic, and we can discuss and argue over how it should be controlled and influenced, but it's purely aesthetic.

As soon as you allow iteration over the fields of a struct, and the creation of a list which captures this ordering, then your ordering decisions enter your data model -- and so the iteration order suddenly starts influencing the calculation of values. There are some members of the CUE team who think it this iteration order should be purely lexicographical, some of us think it should be random (as in Go), some think it should follow toposort, and some of us think the user should have to specify it explicitly all the time. There are in fact more opinions than members of the CUE team. Consequently, we have not resolved this, and so as of right now, the iteration order in a comprehension can be different from the iteration order in output.

For now, absolutely the right thing in these situations is to explicitly sort your fields. This is what Go would force you to do, and it means you are not dependent on the whims of CUE's implementation as to the iteration order. The CUE spec ( https://cuelang.org/docs/reference/spec/#comprehensions ) does not say anything about iteration order. Maybe it should, maybe it shouldn't. I believe that Go's motivation for random iteration was that there was a case in Java where lots of people became dependent on a certain iteration order (of some sort of Map), and then Sun (or Oracle) changed that iteration order and it broke lots of code. It's better to be robust against these sorts of implementation details.

Now complicating the matter further, is that in your case, the list comprehension isn't important. And json.Marshal does not itself use a list comprehension internally. I believe json.Marshal uses the toposort code, and so it suffers from the limitations of toposort.

back to the current issue

If you use ol: [{o.cm}, {o.ref}], then the output I get is:

ol: [{
    j: "{\"a\":\"1\",\"b\":\"2\"}"
    data: {
        a: "1"
        b: "2"
    }
}, "{\"b\":\"2\",\"a\":\"1\"}"]

Note that data's fields (ol[0].data) are the wrong way around at this point. So in the first list item, the json string in j is consistent with data, but data has gone wrong. This is because when the copy of the original o.cm is made and then embedded, ordering information within values which are formed by implicit unification, gets lost. json.Marshal is an example of iterating over fields of a struct and ordering them. The fact that j has changed, is unfortunately exposing details of the evaluation strategy, in that embedding {o.cm} has created a copy of the original o.cm and not just reused the existing value. This is why json.Marshal has been "run again" and has produced a different result. The evaluator could have chosen to evaluate the original o.cm fully first, and then copied all the values out of it, but that's not what's happened. You could well argue that: (a) taking a copy of o.cm that loses ordering information, and then (b) having that loss of ordering information result in a changing iteration order of fields in a way that can be observed (i.e. via json.Marshal) is a violation of commutativity. My comment earlier that "you should only see the final value" is also in jeopardy here: it would appear the final value of o.cm changes due to this copying. The copying wouldn't matter if the ordering was not part of the value, but it is.

For the second list item, (i.e. {o.ref}), there is no copying of o.cm or o.cm.data going on. I don't know if for the 2nd list item, json.Marshal gets called again, or whether it is only called once here for the calculation of o.cm.j and then that string gets copied. But I'm pretty sure there is no copying of o.cm.data in this case, and so no possibility of loss of ordering information.

cuematthew avatar Jun 16 '25 10:06 cuematthew

But also, do I understand correctly that if I do json.Marshal({data}), I can make it behave consistently, because toposort information will always be lost? If so, that would be a feasible workaround, because unlike CUE_DEBUG=sortfields, I can put that in our shared package.

I would put that in the category of "very unfortunate hack", but given that I don't know of any way for you to otherwise specify the order of fields in json serialisation, I can't suggest anything better.

FWIW, I think {data} is preferable to (data & _), because I think we have a cue fix or something which removes redundant _ values, so I think the embedding is the better approach. But this is all pretty unpleasant and I'm sorry you're having to wrestle with this.

cuematthew avatar Jun 16 '25 10:06 cuematthew

I would put that in the category of "very unfortunate hack", but given that I don't know of any way for you to otherwise specify the order of fields in json serialisation, I can't suggest anything better.

If it's just one level I can obviously sort the fields by name and then serialize them in a fixed order, see https://github.com/cue-lang/cue/issues/3969#issuecomment-2970732263. I don't actually need it to be JSON, I just need a value that's dependent on the contents.

However, as I understand it, the problem can still arise when there's a json.Marshal inside of the contents of the ConfigMap, like so

ConfigMap: foo: data: {
	bar: json.Marshal({…})
}

Because then if the issue strikes for the struct that is serialized into ConfigMap.foo.data.bar, sorting the fields of ConfigMap.foo.data would not help.

mxey avatar Jun 16 '25 10:06 mxey

I should also gently point out that using json.Marshal for hashing is a bad idea - json.org explicitly states:

An object is an unordered set of name/value pairs.

However, I'm not sure there's a sane alternative just now if you can't easily pipe through jq -S or similar. I don't think any of the other encodings we support help at all.

cuematthew avatar Jun 16 '25 10:06 cuematthew

My workaround doesn’t use JSON for the hashing, so that’s not a problem.

The problem is if one of the values in the struct that is hashed is a JSON-marshaled struct, I might still get conflicting hashes

mxey avatar Jun 16 '25 10:06 mxey

My workaround doesn’t use JSON for the hashing, so that’s not a problem.

The problem is if one of the values in the struct that is hashed is a JSON-marshaled struct, I might still get conflicting hashes

@cuematthew see https://cuelang.org/play/?id=WhfuH-2x6lQ#w=function&i=cue&f=eval&o=cue - I cannot find a way to break it, can you think of one? Or should this work in all scenarios?

mxey avatar Jun 16 '25 11:06 mxey

Latest iteration of the workaround:

import (
	"crypto/sha256"
	"encoding/hex"
	"list"
	"strconv"
	"strings"
)

_hasher: {
	fields!: [string]: string
	type: *"" | string

	l: list.SortStrings([for k, v in fields {strconv.Quote(k) + "=" + strconv.Quote(v)}])
	s:    type + ";" + strings.Join(l, ",")
	hash: hex.Encode(sha256.Sum256(s)[0:5])
}

testHasher: _hasher & {
	fields: b: "2"
	fields: a: "1"
	type: "foo"
	s!:   #"foo;"a"="1","b"="2""#
}

mxey avatar Jun 16 '25 13:06 mxey

Yeah, I've been trying things like this, which I feel should work with deep structures. It works with evalv2, but doesn't with evalv3. I think it's the same as issue #3393.

import "math/bits"
import "encoding/hex"
import "strconv"

#hash: L={
	in!: _
	prefix!: string
	hashcodes: [...int]
	hashcodes: [
		for k, v in in {
			if (v & {}) != _|_ {
				(#hash & {in: v, prefix: "\(L.prefix).\(k)"}).out
			}
			if (v & {}) == _|_ {
				strconv.ParseInt(hex.Encode("\(L.prefix).\(k):\(v)"), 16, 64)
			}
		}
	]
	acc: [string]: int
	acc: {
		"0": 0
		for i, h in hashcodes {
			"\(i+1)": bits.Xor(acc["\(i)"], h)
		}
	}
	out: int
	out: acc["\(len(hashcodes))"]
}

//o1: (#hash & {in: {x: 5, y: 6}, prefix: ""}).out // o1 works just fine
//o2: (#hash & {in: {z: {a: "hi"}}, prefix: ""}).out // o2 and o3 only work in evalv2
//o3: (#hash & {in: {x: 5, y: 6, z: {a: "hi"}}, prefix: ""}).out

cuematthew avatar Jun 16 '25 13:06 cuematthew

But you certainly don't need the xor stuff - sorting the individual stringified-fields as you have done would work just fine too. And in your code, you might be able to get away without the strings.Join call - you could just compare the lists directly. It's irritating me that I can't make my recursive version work with evalv3.

cuematthew avatar Jun 16 '25 13:06 cuematthew

And in your code, you might be able to get away without the strings.Join call - you could just compare the lists directly.

The whole goal is to create a hash that can be exported outside of CUE, it's not about determining equality inside of CUE. Or I am misunderstanding what you mean by “compare the lists directly”?

Yeah, I've been trying things like this, which I feel should work with deep structures.

The problem is that the “inner” field value has to be JSON, because it's a config file for an app in Kubernetes, for example.

ConfigMap: app:  {
	metadata: name: "this-is-where-the-hash-goes" // <- this part I can do without using JSON
	data: {
		"another-file": "…"
		"config.json": json.Marshal({…}) // <- but here JSON is just the format that's expected
	}
}

mxey avatar Jun 16 '25 14:06 mxey

I guess I could try to parse the inner JSON and normalize it for the hash, but it could also be YAML or TOML or whatever … But the normalization would also be in the actual values, because the ConfigMaps are immutable, so if the values are different we must create a new.

At least in the actual project we had this problem, sorting of the top-level fields seems sufficient to remediate the issue.

mxey avatar Jun 16 '25 16:06 mxey

I am BTW not married to hashing the content like this, I just think it’s a good way to rollout config changes. Something based on time, or Git commit ID might also work, it would just trigger some unnecessary rollouts.

mxey avatar Jun 16 '25 17:06 mxey