evaluator: incorrect behaviour when choosing defaults
What version of CUE are you using (cue version)?
$ cue version
cue version v0.0.0-20240712164527-719893f23850
go version go1.22.3
-buildmode exe
-compiler gc
DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
CGO_ENABLED 1
GOARCH arm64
GOOS linux
vcs git
vcs.revision 719893f23850172d224720e6d1257586179ac895
vcs.time 2024-07-12T16:45:27Z
vcs.modified false
cue.lang.version v0.10.0
Does this issue reproduce with the latest release?
Yes
What did you do?
! exec cue export x.cue
stderr 'b: 2 errors in empty disjunction'
stderr 'b: conflicting values 4 and 1'
stderr 'b: conflicting values 5 and 1'
-- x.cue --
a: *5 | int
if a == 5 {
b: 1
}
b: a | *4
What did you expect to see?
A passing test (for some variation perhaps of the error messages).
What did you see instead?
> ! exec cue export x.cue
[stdout]
{
"a": 5,
"b": 1
}
FAIL: /tmp/testscript4268956037/repro.txtar/script.txtar:1: unexpected command success
The output of b: 1 cannot be correct given that a: 5, and b is constrained such that b: a | *4.
To confirm: the exact same behaviour is seen with CUE_EXPERIMENT=evalv3, i.e. the bug exists in both old and new evaluator implementations.
Another observation: there is nowhere in the spec where we can refer to try and reason about what the behaviour should be here.
So, I think we have this situation, following the details of https://cuelang.org/docs/reference/spec/#default-values
a: int | *5 <int | 5, 5>
b: a | *4 <int | 4 | 5, 4 | 5>
a resolves to 5. The problem seems to be what then happens: presumably at this point we should actually do:
b 🡒 <5> & <int | 4 | 5, 4 | 5> 🡒 <5 & (int | 4 | 5), 5 & (4 | 5)> 🡒 <5, 5> (rule U1)
but I think this step isn't happening. I.e. we don't seem to be tracking that b depends on a, at least not sufficiently.
The result is that we pass into the if a == 5 block and add a <1> unification without having eliminated the int. Consequently we get:
b 🡒 <1> & <int | 4 | 5, 4 | 5> 🡒 <1 & (int | 4 | 5), 1 & (4 | 5)> 🡒 <1, ⊥> (rule U1)
whereas we should be attempting <1> & <5, 5> which should lead to ⊥.
My hunch is that the following happens.
Some time after evaluation starts, we realise can't make any more "progress". At that point, we have the following facts (I leave out the if comprehension on purpose to focus more on what we know about a and b):
a: *5 | int
b: a | *4
At this point, my intuition is that what we should do is choose a default for a but not b, and leave the "reference" from b to a intact. Reason being, a is "not done yet" and the existence of the a to b dependency means that after choosing defaults we might know more. I can't see anything in the spec however which would confirm this rule of ordering however.
Instead what I think currently happens is that we "erase" the reference to a too early (much like you suggest). Indeed that must happen before we have chosen the default for a because otherwise we would know that a: 5. So we end up with, in effect:
a: *5 | int
b: *5 | int | *4
Then the default for a is being chosen leaving us at:
a: 5
b: *5 | int | *4
At which point we can evaluate the original if comprehension which leaves us at:
a: 5
b: *5 | int | *4
b: 1
This reduces down to:
a: 5
b: 1
without us even needing to revert to choosing a default for b.
Even if something else is happening here, I think the lack of spec clarity with respect to when defaults are chosen and when references are "erased" is a problem.
Here is an interesting follow-up example on the sequencing of default selection:
c: *5 | d
d: c | *4
if c == 5 {
d: 1
}
In this case we should again get an error as the result, but we would choose defaults for a and b at the same time. i.e. we would get to a point in evaluation where we know:
c: *5 | d
d: c | *4
and not be able to make any further progress. Because of the mutual references, neither default selection orders before the other, so we choose defaults for both simultaneously. At which point we know:
c: 5
d: 4
Now we can evaluate the if comprehension and end up with:
c: 5
d: 4
d: 1
and hence error.
Here's another example, maybe related, which had a similar effect, but only with the old evaluator:
import "strconv"
x: *strconv.Atoi(y) | "1"
y: x
With the old evaluator cue export of the above produces:
{
"x": 1,
"y": "1"
}
which similarly fails to validate if vetted along with the original.
With CUE_EXPERIMENT=evalv3, it produces a cycle error, which seems about right.
Maybe the original example in this issue should also be a cycle error?
Even if something else is happening here, I think the lack of spec clarity with respect to when defaults are chosen and when references are "erased" is a problem
For me, this is the key problem: it is not clear (to me?) what b: a | *4 means: it could mean that b gets a deep copy of a before a has been evaluated at all. I think this is similar to referencing a definition - you're copying it and each copy is unrelated and so can be unified/evaluated in different ways. I believe that if this is the case, you can get to the a: 5, b: 1 result. Alternatively, b: a | *4 is just a reference to a's disjunction and so a and b share part of the graph, and so any evaluation of that graph should affect them both. In which case the only possible result (I think) is a: 5, b: 5.
If the latter option is chosen, note that in general, if b references a in this way (potentially transitively) and a has a default, then b's default (if declared) will never be chosen, and as such we might want to produce an error or warning of this.
In which case the only possible result (I think) is
a: 5, b: 5.
... but clearly, if a is 5, then b must be 1 (because of the if) and hence the above is not a valid result.
Alternatively,
b: a | *4is just a reference toa's disjunction and soaandbshare part of the graph, and so any evaluation of that graph should affect them both.
FWIW my understanding is that this is (or should be) the case. In support of this, consider:
#x: {
a: *5 | int
b: a
}
x: #x
x: a: 1
If b just made a deep copy of a, then I think that it would end up as 5, because it wouldn't be influenced by the eventual value of a. But it ends up as 1, which seems right to me.
In general if I see a reference, ISTM that it should always refer to the final value of the thing referred to. Anything else seems like it's going to be less intuitive and useful.
Thus I believe that the original example should be an error of some kind (whether conflict or cycle I'm not quite sure).
In which case the only possible result (I think) is
a: 5, b: 5.... but clearly, if
ais 5, thenbmust be1(because of theif) and hence the above is not a valid result.
Ah yes, sorry - too sleepy this morning and I'm leaving out the if ... when it suits me.
In general if I see a reference, ISTM that it should always refer to the final value of the thing referred to.
With the exception of references to definitions, yes?
With the exception of references to definitions, yes?
No. Definitions are special only in that they recursively close their value on use of the definition, i.e. a reference to it. Otherwise, all fields (regardless of definition or not, hidden or not) unify in the same way, and a reference to a field is to the resulting/ultimate constraints "gathered" by that field.
I think one of the confusing points with this issue as I mentioned in https://github.com/cue-lang/cue/issues/3296#issuecomment-2246898444 is that it appears that a "copy" of a is happening in the reference from b before we get to the ultimate value of a. But that's exactly the bug: we shouldn't be doing that.
Yep, after much discussion and thought, I'm happy I'm in a less confused state wrt definitions etc.
Defaults are not picked by taking a reference. Moreover, terms within a marked default that themselves are disjunctions have their defaults erased. The spec is quite explicit about this, although it could use more examples.
Binary operators resolve a references to a scalar, so in this case the reference to a is resolved to 5.
So b is equal to 5 | int | *4, which after unifying with the result from the comprehension results in 1.
@rogpeppe
In general if I see a reference, ISTM that it should always refer to the final value of the thing referred to. Anything else seems like it's going to be less intuitive and useful.
This is definitely not the case. The & and | operations would not be very useful if this were the case. For instance, you would not be able to override a default value. In general, a reference results in a local copy of what it refers to. Logically at least. There are optimizations such as structure sharing to prevent actual copying.
Not to properly respond to any of the points above, rather just note down an observation based on my reading of the thread: namely that if I see:
b: a
there are at least two possible "interpretations" for the reader:
- We are templating
bfrom the constraints gathered againsta - We are constraining
bto be the value ofa, where the phrase "the value ofa" to mean the observed value ofa, i.e. the result ofcue export.
Per Marcel, the spec and implementation is currently adhering to 1.
My observation is that because of these two "interpretations" for the reader, then we at least have a visual ambiguity as to what it means to write b: a. An ambiguity that is compounded/made more confusing by b: a + 0 doing something else.
As such, I'm wondering if we need to tease apart these two situations with different syntax/other.
I leave this very much as an incomplete thought/comment... mainly to not forget where I got to so that I can come back to it later.
I had a good exchange with @rogpeppe that ended up touching on the topics raised in my previous comment.
Specifically the ambiguity of the following with respect to the original intent of the user:
b: a
In that conversation I noted that the following "sketch" would seem to fix the ambiguity, by making it it explicit what is meant by the user:
b1: a // constrain b1 by the value of a
b2: finalize(a) // constrain b2 by the finalize-d value of a
b3: concrete(a) // constrain b3 by the recursively-closed concrete value of a
See https://github.com/cue-lang/cue/issues/943#issuecomment-2585607385 for an explanation of the hypothesised finalize() and concrete().
@rogpeppe also brought up the suggestion that a ==a validator could be an alternative way of spelling:
b4?: concrete(a)
@rogpeppe also brought up the suggestion that a
==avalidator could be an alternative way of spelling:b4?: concrete(a)
FWIW I'm not sure it would be quite the same. For a start, I don't think that concrete should return a closed value. But even if it did, I think the semantics would be different.
For example:
p: {
a: b: true
b: concrete(a)
c: ==a
}
q: {
p
b: other: true
c: other: true
}
At least with the semantics I'm supposing for concrete and ==, p.b would be OK (it's OK to add another field to an embedded closed struct) but p.c would not (even though it's OK to add the field, the resulting value of c is not equal to a so there's a conflict).
Noting in passing that @mpvl also threw the following into the mix:
We could consider the unary
*to "dereference" or take a default, likeb: *a.
To make explicit an important point touched on in the previous two comments:
a: {
// some structure
}
b: a
Ignoring defaults for one second (but that's a related issue), this could mean that b is templated (and therefore constrained) to be the final value of a, such that they are structurally identical. i.e. b is not allowed to declare additional fields etc.
Saying it out loud, I think such an interpretation would be too restrictive. Structure being open by default in CUE is useful. Instead, I think we want to allow b to declare additional fields, but provide a mechanism for enforcing the "must be structurally equal", a mechanism that reflects the fact that it's closedness that is at play.
To make explicit an important point touched on in the previous two comments:
a: { // some structure } b: aIgnoring defaults for one second (but that's a related issue), this could mean that
bis templated (and therefore constrained) to be the final value ofa, such that they are structurally identical. i.e.bis not allowed to declare additional fields etc.Saying it out loud, I think such an interpretation would be too restrictive. Structure being open by default in CUE is useful. Instead, I think we want to allow
bto declare additional fields, but provide a mechanism for enforcing the "must be structurally equal", a mechanism that reflects the fact that it's closedness that is at play.
Given a unary == operator, presumably that could be:
a: {
// some structure
}
b: a & ==a