evalv3 performance on cycles
What version of CUE are you using (cue version)?
$ cue version
cue version v0.11.0-alpha.3
go version go1.23.2
-buildmode exe
-compiler gc
-trimpath true
CGO_ENABLED 0
GOARCH arm64
GOOS darwin
GOARM64 v8.0
cue.lang.version v0.11.0
Does this issue reproduce with the latest stable release?
Yes
What did you do?
When evaluating that example
package main
#TimeSpan: {
start: float
duration: float
end: float
start: end - duration
duration: end - start
end: start + duration
start: <=end
}
#Task: {
action: string
timeSpan: #TimeSpan
}
#TaskSet: {
action: string
#tasks: [string]: #Task
tasks: [for _, v in #tasks {v}]
tasks: [...#Task]
timeSpan?: #TimeSpan
}
#T1: #Task & {
action: "Task 1"
timeSpan: #TimeSpan & {
duration: 4.2
}
}
#T2: #Task & {
action: "Task 2"
timeSpan: #TimeSpan & {
duration: 120.0
}
}
ts: #TaskSet
ts: this={
action: "Task 1 and Task 2"
#tasks: {
t1: #T1 & {
timeSpan: end: this.#tasks.t2.timeSpan.start
}
t2: #T2 & {
timeSpan: end: this.timeSpan.end
}
}
timeSpan: start: this.#tasks.t1.timeSpan.start
}
performance without evalv3:
0.29 real 0.31 user 0.03 sys
51806208 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
2797 page reclaims
645 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
626 signals received
561 voluntary context switches
2136 involuntary context switches
4477669464 instructions retired
1141032608 cycles elapsed
35964416 peak memory footprint
with evalv3:
26.65 real 28.49 user 19.72 sys
5464260608 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
4601153 page reclaims
11 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
49995 signals received
23226 voluntary context switches
1507128 involuntary context switches
533667430315 instructions retired
149088659879 cycles elapsed
31108296192 peak memory footprint
What did you expect to see?
Similar (or better) performance (both memory & time) with new evalv3 algorithm.
What did you see instead?
Significantly worth performances than the previous algorithm.
Notes
This might be related to #3208.
Also, #TaskSet.tasks sounds like the trouble maker here. When removing that part of the definition, it's back to nominal.
Assigning to myself to attempt to reduce this a bit further.
As of 860906a7f025d8ea766aa29623c4ce4381740f89, evalv2 seems to give an error similar to https://github.com/cue-lang/cue/issues/3511, although that was an evalv3 regression, and not an error that evalv2 would give:
> env CUE_EXPERIMENT=evalv3=0
> exec cue export
[stderr]
cycle error
[exit status 1]
The new evaluator seems to want to consume all of my available RAM, so I killed it before it managed to do that. It was using about 30GiB of RAM when I kille dit.
@b4nst what exactly are you running? With cue export with the old evaluator as per the testscript below, I get the same cycle error failure on master, v0.11.0-alpha.3, v0.10.0, and v0.9.2:
env CUE_EXPERIMENT=evalv3=0
exec cue export
-- main.cue --
package p
#TimeSpan: {
start: float
duration: float
end: float
start: end - duration
duration: end - start
end: start + duration
start: <=end
}
#Task: {
action: string
timeSpan: #TimeSpan
}
#TaskSet: {
action: string
#tasks: [string]: #Task
tasks: [for _, v in #tasks {v}]
tasks: [...#Task]
timeSpan?: #TimeSpan
}
#T1: #Task & {
action: "Task 1"
timeSpan: #TimeSpan & {
duration: 4.2
}
}
#T2: #Task & {
action: "Task 2"
timeSpan: #TimeSpan & {
duration: 120.0
}
}
ts: #TaskSet
ts: this={
action: "Task 1 and Task 2"
#tasks: {
t1: #T1 & {
timeSpan: end: this.#tasks.t2.timeSpan.start
}
t2: #T2 & {
timeSpan: end: this.timeSpan.end
}
}
timeSpan: start: this.#tasks.t1.timeSpan.start
}
You can run this with https://pkg.go.dev/github.com/rogpeppe/go-internal/cmd/testscript.
Oh sorry, thought I did put the command. I'm running cue eval -e ts main.cue, just for it to give me the output without trying to break the cycle (which is indeed unbreakable until you set ts.timeSpan.end).
When doing an export like you did, without CUE_EXPERIMENT set to evalv3 I do get the cycle error. With CUE_EXPERIMENT set to evalv3, it starts eating all my memory.
OK thank you. Is there a particular reason you would usually cue eval, or is it only for the sake of this bug report? I would typically want to reproduce performance issues with cue export instead, because typically that's the command that users actually care about.
Is there a cue export command I can use? Or is the bug still present with that command?
I'll try to provide a reproducible example without leaking too much details, but basically yeah we do use export eventually. The configuration is humongous tho, this was a short example that helped reproducing the error we're currently facing.
To give a little bit of context, in our use case some of the data is user provided (as tag, or input file). Some of that input ends up in one of a TimeSpan start, end or duration field. We've been hit by multiple issues, some user related (corrupted input), some because of configuration issues (multiple preference markers on the same field, value conflict, etc...) or CUE itself (float conflicts on the exact same value, because of different precision in different loop of the same cycle). But the root cause is always a conflict that lead to a cycle error on a TimeSpan (the conflict makes the cycle unbreakable).
The thing is, those errors are more and more hard to identify for us. Often CUE goes OOM, effectively hiding the actually issue. Sometimes it also "loses" (this is probably wrong technical term) the context, and makes it hard to track back the origin of the cycle error.
For that reason, we started evaluating the v3 algorithm. Long story short we experienced insane performance gain using it. Which lead to overall simpler identification of input / configuration issues.
However those tests were done on reduced and simplified code base. We cannot migrate just yet, and it seems to me the last issue comes from that tasks definition (the for loop). It seems it makes CUE going OOM with evalv3, and not only when there's a cycle error (I experienced it on code base where evalv2 was able to export properly).
Understood, thank you.
It seems it makes CUE going OOM with
evalv3, and not only when there's a cycle error (I experienced it on code base where evalv2 was able to export properly).
If you would be able to share this with us, either as a reduced/simplified example you can share publicly, or sharing privately with me or via https://cuelabs.dev/unity/, that would be really helpful.
In the meantime, I'll try to reduce the cue eval example you already shared above, but ideally we would be testing your entire private project via Unity - which allows us to really ensure that your entire config works on the new evaluator, and test it for regressions with upcoming evaluator changes.
Thanks a lot! I'll work tomorrow on a reproducible example we can put on the unity project. It'll help me cutting the refactor down anyway.
Ok I think I can validate the behaviour with that example and export. Basically by adding ts: timeSpan: end: 100.0 to break the cycle, both evalv2 and v3 show similar performances:
env CUE_EXPERIMENT=evalv3
exec time -l cue export -e ts
-- main.cue --
package p
#TimeSpan: {
start: float
duration: float
end: float
start: end - duration
duration: end - start
end: start + duration
start: <=end
}
#Task: {
action: string
timeSpan: #TimeSpan
}
#TaskSet: {
action: string
#tasks: [string]: #Task
tasks: [for _, v in #tasks {v}]
tasks: [...#Task]
timeSpan?: #TimeSpan
}
#T1: #Task & {
action: "Task 1"
timeSpan: #TimeSpan & {
duration: 4.2
}
}
#T2: #Task & {
action: "Task 2"
timeSpan: #TimeSpan & {
duration: 120.0
}
}
ts: #TaskSet
ts: this={
action: "Task 1 and Task 2"
#tasks: {
t1: #T1 & {
timeSpan: end: this.#tasks.t2.timeSpan.start
}
t2: #T2 & {
timeSpan: end: this.timeSpan.end
}
}
timeSpan: start: this.#tasks.t1.timeSpan.start
}
ts: timeSpan: end: 100.0
0.02 real 0.02 user 0.00 sys
21364736 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
1569 page reclaims
3 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
33 signals received
39 voluntary context switches
222 involuntary context switches
153678554 instructions retired
62430448 cycles elapsed
11503104 peak memory footprint
Our issue comes when there is a cycle error. evalv2 can detect it, although showing significantly worst performances than without the error:
ts: cycle error
0.50 real 0.62 user 0.02 sys
52264960 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
3459 page reclaims
2 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
707 signals received
544 voluntary context switches
2150 involuntary context switches
9502462162 instructions retired
2105547898 cycles elapsed
39781952 peak memory footprint
With evalv3 on the other hand, I can't get it to raise the cycle error. The process starts eating all my memory.
So our problem can be summarized as:
- evalv2:
- nominal execution can take a lot of resources (time + memory) especially when there's a lot of nested cycles in our configuration. This example being super simple, it's not really representative of that performance gap.
- cycle error, when happening, are hard to track. No stack to trace to the actual definition raising the cycle error makes it often hard for us to identify the configuration or user input that caused it. Add on top of that preference markers and float operations in user inputs, and it quickly becomes an actual nightmare
- evalv3:
- we experienced massive performance gain in nominal situations with nested cycles, on dedicated examples (I can give some, but I bet you already have ton of them). However when applied to our code base (currently successfully resolving using
0.4.3), it goes OOM. I do suspect a cycle that0.4.3was somehow able to break, but it's hard to troubleshoot. - When a cycle error happens, it seems it cannot manage to get to the result (=
cycle error). This simple example is enough to demonstrate it. This makes upgrade hard to happen, as we're not able to easily identify what's currently causing it on our code base.
- we experienced massive performance gain in nominal situations with nested cycles, on dedicated examples (I can give some, but I bet you already have ton of them). However when applied to our code base (currently successfully resolving using
On another note, our legal team is currently looking into the unity project policies. So hopefully we'll soon be allowed to share more concrete examples of those situations.
Thanks again for all your help, it is really aprecciated!
Hey, just for information I can reproduce the same error with v0.11.0-alpha.4 and evalv3
Hey @mvdan 👋 , I am a colleague of b4nst. This is just a follow-up on our legal team investigation regarding Unity: we cannot share more concrete examples, unfortunately. This example provides a concrete example nevertheless of this issue, we will keep you posted on our own check on the next releases, and this remains a major blocker on our side to migrate part of our codebase from 0.4.3.
Thank you for trying and for following up. We'll still try to investigate and resolve this issue as soon as we are able to. We are aware of other similar issues, such as https://github.com/cue-lang/cue/issues/3610, which we reduced from a codebase of our own.
I noticed the counters (number of "CUE ops") were low, so it was not an issue with the CUE algorithm. With a bit of profiling, this seems to be a problem with appendToList which is only used for errors.
Changing "cue/errors".Append to simply return a or b, instead of creating a list, reduces the running time to 0.3s again. Not sure why, still have to to investigate.