cue evaluator: incorrect behaviour when choosing defaults

What version of CUE are you using (`cue version`)?

$ cue version
cue version v0.0.0-20240712164527-719893f23850

go version go1.22.3
      -buildmode exe
       -compiler gc
  DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
     CGO_ENABLED 1
          GOARCH arm64
            GOOS linux
             vcs git
    vcs.revision 719893f23850172d224720e6d1257586179ac895
        vcs.time 2024-07-12T16:45:27Z
    vcs.modified false
cue.lang.version v0.10.0

Does this issue reproduce with the latest release?

Yes

What did you do?

! exec cue export x.cue
stderr 'b: 2 errors in empty disjunction'
stderr 'b: conflicting values 4 and 1'
stderr 'b: conflicting values 5 and 1'

-- x.cue --
a: *5 | int
if a == 5 {
    b: 1
}
b: a | *4

What did you expect to see?

A passing test (for some variation perhaps of the error messages).

What did you see instead?

> ! exec cue export x.cue
[stdout]
{
    "a": 5,
    "b": 1
}
FAIL: /tmp/testscript4268956037/repro.txtar/script.txtar:1: unexpected command success

The output of b: 1 cannot be correct given that a: 5, and b is constrained such that b: a | *4.

To confirm: the exact same behaviour is seen with CUE_EXPERIMENT=evalv3, i.e. the bug exists in both old and new evaluator implementations.

Another observation: there is nowhere in the spec where we can refer to try and reason about what the behaviour should be here.

Jul 17 '24 11:07 myitcv

So, I think we have this situation, following the details of https://cuelang.org/docs/reference/spec/#default-values

a: int | *5    <int | 5,  5>
b: a | *4      <int | 4 | 5, 4 | 5>

a resolves to 5. The problem seems to be what then happens: presumably at this point we should actually do:

b 🡒 <5> & <int | 4 | 5, 4 | 5> 🡒 <5 & (int | 4 | 5), 5 & (4 | 5)> 🡒 <5, 5> (rule U1)

but I think this step isn't happening. I.e. we don't seem to be tracking that b depends on a, at least not sufficiently.

The result is that we pass into the if a == 5 block and add a <1> unification without having eliminated the int. Consequently we get:

b 🡒 <1> & <int | 4 | 5, 4 | 5> 🡒 <1 & (int | 4 | 5), 1 & (4 | 5)> 🡒 <1, ⊥> (rule U1)

whereas we should be attempting <1> & <5, 5> which should lead to ⊥.

Jul 23 '24 15:07 cuematthew

My hunch is that the following happens.

Some time after evaluation starts, we realise can't make any more "progress". At that point, we have the following facts (I leave out the if comprehension on purpose to focus more on what we know about a and b):

a: *5 | int
b: a | *4

At this point, my intuition is that what we should do is choose a default for a but not b, and leave the "reference" from b to a intact. Reason being, a is "not done yet" and the existence of the a to b dependency means that after choosing defaults we might know more. I can't see anything in the spec however which would confirm this rule of ordering however.

Instead what I think currently happens is that we "erase" the reference to a too early (much like you suggest). Indeed that must happen before we have chosen the default for a because otherwise we would know that a: 5. So we end up with, in effect:

a: *5 | int
b:  *5 | int | *4

Then the default for a is being chosen leaving us at:

a: 5
b: *5 | int | *4

At which point we can evaluate the original if comprehension which leaves us at:

a: 5
b: *5 | int | *4
b: 1

This reduces down to:

a: 5
b: 1

without us even needing to revert to choosing a default for b.

Even if something else is happening here, I think the lack of spec clarity with respect to when defaults are chosen and when references are "erased" is a problem.

Here is an interesting follow-up example on the sequencing of default selection:

c: *5 | d
d: c | *4
if c == 5 {
	d: 1
}

In this case we should again get an error as the result, but we would choose defaults for a and b at the same time. i.e. we would get to a point in evaluation where we know:

c: *5 | d
d: c | *4

and not be able to make any further progress. Because of the mutual references, neither default selection orders before the other, so we choose defaults for both simultaneously. At which point we know:

c: 5
d: 4

Now we can evaluate the if comprehension and end up with:

c: 5
d: 4
d: 1

and hence error.

Jul 24 '24 05:07 myitcv

Here's another example, maybe related, which had a similar effect, but only with the old evaluator:

import "strconv"
x: *strconv.Atoi(y) | "1"
y: x

With the old evaluator cue export of the above produces:

{
    "x": 1,
    "y": "1"
}

which similarly fails to validate if vetted along with the original.

With CUE_EXPERIMENT=evalv3, it produces a cycle error, which seems about right.

Maybe the original example in this issue should also be a cycle error?

Jul 24 '24 07:07 rogpeppe

Even if something else is happening here, I think the lack of spec clarity with respect to when defaults are chosen and when references are "erased" is a problem

For me, this is the key problem: it is not clear (to me?) what b: a | *4 means: it could mean that b gets a deep copy of a before a has been evaluated at all. I think this is similar to referencing a definition - you're copying it and each copy is unrelated and so can be unified/evaluated in different ways. I believe that if this is the case, you can get to the a: 5, b: 1 result. Alternatively, b: a | *4 is just a reference to a's disjunction and so a and b share part of the graph, and so any evaluation of that graph should affect them both. In which case the only possible result (I think) is a: 5, b: 5.

If the latter option is chosen, note that in general, if b references a in this way (potentially transitively) and a has a default, then b's default (if declared) will never be chosen, and as such we might want to produce an error or warning of this.

Jul 24 '24 07:07 cuematthew

In which case the only possible result (I think) is a: 5, b: 5.

... but clearly, if a is 5, then b must be 1 (because of the if) and hence the above is not a valid result.

Alternatively, b: a | *4 is just a reference to a's disjunction and so a and b share part of the graph, and so any evaluation of that graph should affect them both.

FWIW my understanding is that this is (or should be) the case. In support of this, consider:

#x: {
	a: *5 | int
	b: a
}

x: #x
x: a: 1

If b just made a deep copy of a, then I think that it would end up as 5, because it wouldn't be influenced by the eventual value of a. But it ends up as 1, which seems right to me.

In general if I see a reference, ISTM that it should always refer to the final value of the thing referred to. Anything else seems like it's going to be less intuitive and useful.

Thus I believe that the original example should be an error of some kind (whether conflict or cycle I'm not quite sure).

Jul 24 '24 07:07 rogpeppe

In which case the only possible result (I think) is a: 5, b: 5.

... but clearly, if a is 5, then b must be 1 (because of the if) and hence the above is not a valid result.

Ah yes, sorry - too sleepy this morning and I'm leaving out the if ... when it suits me.

In general if I see a reference, ISTM that it should always refer to the final value of the thing referred to.

With the exception of references to definitions, yes?

Jul 24 '24 07:07 cuematthew

With the exception of references to definitions, yes?

No. Definitions are special only in that they recursively close their value on use of the definition, i.e. a reference to it. Otherwise, all fields (regardless of definition or not, hidden or not) unify in the same way, and a reference to a field is to the resulting/ultimate constraints "gathered" by that field.

I think one of the confusing points with this issue as I mentioned in https://github.com/cue-lang/cue/issues/3296#issuecomment-2246898444 is that it appears that a "copy" of a is happening in the reference from b before we get to the ultimate value of a. But that's exactly the bug: we shouldn't be doing that.

Jul 25 '24 13:07 myitcv

Yep, after much discussion and thought, I'm happy I'm in a less confused state wrt definitions etc.

Jul 25 '24 14:07 cuematthew

Defaults are not picked by taking a reference. Moreover, terms within a marked default that themselves are disjunctions have their defaults erased. The spec is quite explicit about this, although it could use more examples.

Binary operators resolve a references to a scalar, so in this case the reference to a is resolved to 5.

So b is equal to 5 | int | *4, which after unifying with the result from the comprehension results in 1.

Jul 26 '24 09:07 mpvl

@rogpeppe

In general if I see a reference, ISTM that it should always refer to the final value of the thing referred to. Anything else seems like it's going to be less intuitive and useful.

This is definitely not the case. The & and | operations would not be very useful if this were the case. For instance, you would not be able to override a default value. In general, a reference results in a local copy of what it refers to. Logically at least. There are optimizations such as structure sharing to prevent actual copying.

Jul 26 '24 13:07 mpvl

Not to properly respond to any of the points above, rather just note down an observation based on my reading of the thread: namely that if I see:

b: a

there are at least two possible "interpretations" for the reader:

We are templating b from the constraints gathered against a
We are constraining b to be the value of a, where the phrase "the value of a" to mean the observed value of a, i.e. the result of cue export.

Per Marcel, the spec and implementation is currently adhering to 1.

My observation is that because of these two "interpretations" for the reader, then we at least have a visual ambiguity as to what it means to write b: a. An ambiguity that is compounded/made more confusing by b: a + 0 doing something else.

As such, I'm wondering if we need to tease apart these two situations with different syntax/other.

I leave this very much as an incomplete thought/comment... mainly to not forget where I got to so that I can come back to it later.

Jul 29 '24 13:07 myitcv

I had a good exchange with @rogpeppe that ended up touching on the topics raised in my previous comment.

Specifically the ambiguity of the following with respect to the original intent of the user:

b: a

In that conversation I noted that the following "sketch" would seem to fix the ambiguity, by making it it explicit what is meant by the user:

b1: a           // constrain b1 by the value of a
b2: finalize(a) // constrain b2 by the finalize-d value of a
b3: concrete(a) // constrain b3 by the recursively-closed concrete value of a

See https://github.com/cue-lang/cue/issues/943#issuecomment-2585607385 for an explanation of the hypothesised finalize() and concrete().

@rogpeppe also brought up the suggestion that a ==a validator could be an alternative way of spelling:

b4?: concrete(a)

Jan 13 '25 09:01 myitcv

@rogpeppe also brought up the suggestion that a ==a validator could be an alternative way of spelling:
b4?: concrete(a)

FWIW I'm not sure it would be quite the same. For a start, I don't think that concrete should return a closed value. But even if it did, I think the semantics would be different.

For example:

p: {
    a: b: true
    b: concrete(a)
    c: ==a
}

q: {
   p
   b: other: true
   c: other: true
}

At least with the semantics I'm supposing for concrete and ==, p.b would be OK (it's OK to add another field to an embedded closed struct) but p.c would not (even though it's OK to add the field, the resulting value of c is not equal to a so there's a conflict).

Jan 13 '25 11:01 rogpeppe

Noting in passing that @mpvl also threw the following into the mix:

We could consider the unary * to "dereference" or take a default, like b: *a.

Jun 02 '25 10:06 myitcv

To make explicit an important point touched on in the previous two comments:

a: { 
    // some structure 
}
b: a

Ignoring defaults for one second (but that's a related issue), this could mean that b is templated (and therefore constrained) to be the final value of a, such that they are structurally identical. i.e. b is not allowed to declare additional fields etc.

Saying it out loud, I think such an interpretation would be too restrictive. Structure being open by default in CUE is useful. Instead, I think we want to allow b to declare additional fields, but provide a mechanism for enforcing the "must be structurally equal", a mechanism that reflects the fact that it's closedness that is at play.

Jun 04 '25 08:06 myitcv

To make explicit an important point touched on in the previous two comments:
a: { 
    // some structure 
}
b: a
Ignoring defaults for one second (but that's a related issue), this could mean that b is templated (and therefore constrained) to be the final value of a, such that they are structurally identical. i.e. b is not allowed to declare additional fields etc.

Saying it out loud, I think such an interpretation would be too restrictive. Structure being open by default in CUE is useful. Instead, I think we want to allow b to declare additional fields, but provide a mechanism for enforcing the "must be structurally equal", a mechanism that reflects the fact that it's closedness that is at play.

Given a unary == operator, presumably that could be:

a: { 
    // some structure 
}
b: a & ==a

Jun 04 '25 09:06 rogpeppe

evaluator: incorrect behaviour when choosing defaults

What version of CUE are you using (cue version)?

Does this issue reproduce with the latest release?

What did you do?

What did you expect to see?

What did you see instead?

What version of CUE are you using (`cue version`)?