cue evalv3 performance on cycles

What version of CUE are you using (`cue version`)?

$ cue version
cue version v0.11.0-alpha.3

go version go1.23.2
      -buildmode exe
       -compiler gc
       -trimpath true
     CGO_ENABLED 0
          GOARCH arm64
            GOOS darwin
         GOARM64 v8.0
cue.lang.version v0.11.0

Does this issue reproduce with the latest stable release?

Yes

What did you do?

When evaluating that example

package main

#TimeSpan: {
	start:    float
	duration: float
	end:      float

	start:    end - duration
	duration: end - start
	end:      start + duration

	start: <=end
}

#Task: {
	action:   string
	timeSpan: #TimeSpan
}

#TaskSet: {
	action: string
	#tasks: [string]: #Task
	tasks: [for _, v in #tasks {v}]
	tasks: [...#Task]
	timeSpan?: #TimeSpan
}

#T1: #Task & {
	action: "Task 1"
	timeSpan: #TimeSpan & {
		duration: 4.2
	}
}

#T2: #Task & {
	action: "Task 2"
	timeSpan: #TimeSpan & {
		duration: 120.0
	}
}

ts: #TaskSet
ts: this={
	action: "Task 1 and Task 2"
	#tasks: {
		t1: #T1 & {
			timeSpan: end: this.#tasks.t2.timeSpan.start
		}
		t2: #T2 & {
			timeSpan: end: this.timeSpan.end
		}
	}
	timeSpan: start: this.#tasks.t1.timeSpan.start
}

performance without evalv3:

0.29 real         0.31 user         0.03 sys
            51806208  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                2797  page reclaims
                 645  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                 626  signals received
                 561  voluntary context switches
                2136  involuntary context switches
          4477669464  instructions retired
          1141032608  cycles elapsed
            35964416  peak memory footprint

with evalv3:

26.65 real        28.49 user        19.72 sys
          5464260608  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
             4601153  page reclaims
                  11  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
               49995  signals received
               23226  voluntary context switches
             1507128  involuntary context switches
        533667430315  instructions retired
        149088659879  cycles elapsed
         31108296192  peak memory footprint

What did you expect to see?

Similar (or better) performance (both memory & time) with new evalv3 algorithm.

What did you see instead?

Significantly worth performances than the previous algorithm.

Notes

This might be related to #3208. Also, #TaskSet.tasks sounds like the trouble maker here. When removing that part of the definition, it's back to nominal.

Oct 17 '24 13:10 b4nst

Assigning to myself to attempt to reduce this a bit further.

As of 860906a7f025d8ea766aa29623c4ce4381740f89, evalv2 seems to give an error similar to https://github.com/cue-lang/cue/issues/3511, although that was an evalv3 regression, and not an error that evalv2 would give:

> env CUE_EXPERIMENT=evalv3=0
> exec cue export
[stderr]
cycle error
[exit status 1]

The new evaluator seems to want to consume all of my available RAM, so I killed it before it managed to do that. It was using about 30GiB of RAM when I kille dit.

Oct 17 '24 13:10 mvdan

@b4nst what exactly are you running? With cue export with the old evaluator as per the testscript below, I get the same cycle error failure on master, v0.11.0-alpha.3, v0.10.0, and v0.9.2:

env CUE_EXPERIMENT=evalv3=0
exec cue export

-- main.cue --
package p

#TimeSpan: {
	start:    float
	duration: float
	end:      float

	start:    end - duration
	duration: end - start
	end:      start + duration

	start: <=end
}

#Task: {
	action:   string
	timeSpan: #TimeSpan
}

#TaskSet: {
	action: string
	#tasks: [string]: #Task
	tasks: [for _, v in #tasks {v}]
	tasks: [...#Task]
	timeSpan?: #TimeSpan
}

#T1: #Task & {
	action: "Task 1"
	timeSpan: #TimeSpan & {
		duration: 4.2
	}
}

#T2: #Task & {
	action: "Task 2"
	timeSpan: #TimeSpan & {
		duration: 120.0
	}
}

ts: #TaskSet
ts: this={
	action: "Task 1 and Task 2"
	#tasks: {
		t1: #T1 & {
			timeSpan: end: this.#tasks.t2.timeSpan.start
		}
		t2: #T2 & {
			timeSpan: end: this.timeSpan.end
		}
	}
	timeSpan: start: this.#tasks.t1.timeSpan.start
}

You can run this with https://pkg.go.dev/github.com/rogpeppe/go-internal/cmd/testscript.

Oct 17 '24 14:10 mvdan

Oh sorry, thought I did put the command. I'm running cue eval -e ts main.cue, just for it to give me the output without trying to break the cycle (which is indeed unbreakable until you set ts.timeSpan.end).

When doing an export like you did, without CUE_EXPERIMENT set to evalv3 I do get the cycle error. With CUE_EXPERIMENT set to evalv3, it starts eating all my memory.

Oct 17 '24 15:10 b4nst

OK thank you. Is there a particular reason you would usually cue eval, or is it only for the sake of this bug report? I would typically want to reproduce performance issues with cue export instead, because typically that's the command that users actually care about.

Is there a cue export command I can use? Or is the bug still present with that command?

Oct 17 '24 15:10 mvdan

I'll try to provide a reproducible example without leaking too much details, but basically yeah we do use export eventually. The configuration is humongous tho, this was a short example that helped reproducing the error we're currently facing.

To give a little bit of context, in our use case some of the data is user provided (as tag, or input file). Some of that input ends up in one of a TimeSpan start, end or duration field. We've been hit by multiple issues, some user related (corrupted input), some because of configuration issues (multiple preference markers on the same field, value conflict, etc...) or CUE itself (float conflicts on the exact same value, because of different precision in different loop of the same cycle). But the root cause is always a conflict that lead to a cycle error on a TimeSpan (the conflict makes the cycle unbreakable). The thing is, those errors are more and more hard to identify for us. Often CUE goes OOM, effectively hiding the actually issue. Sometimes it also "loses" (this is probably wrong technical term) the context, and makes it hard to track back the origin of the cycle error. For that reason, we started evaluating the v3 algorithm. Long story short we experienced insane performance gain using it. Which lead to overall simpler identification of input / configuration issues. However those tests were done on reduced and simplified code base. We cannot migrate just yet, and it seems to me the last issue comes from that tasks definition (the for loop). It seems it makes CUE going OOM with evalv3, and not only when there's a cycle error (I experienced it on code base where evalv2 was able to export properly).

Oct 17 '24 15:10 b4nst

Understood, thank you.

It seems it makes CUE going OOM with evalv3, and not only when there's a cycle error (I experienced it on code base where evalv2 was able to export properly).

If you would be able to share this with us, either as a reduced/simplified example you can share publicly, or sharing privately with me or via https://cuelabs.dev/unity/, that would be really helpful.

In the meantime, I'll try to reduce the cue eval example you already shared above, but ideally we would be testing your entire private project via Unity - which allows us to really ensure that your entire config works on the new evaluator, and test it for regressions with upcoming evaluator changes.

Oct 17 '24 16:10 mvdan

Thanks a lot! I'll work tomorrow on a reproducible example we can put on the unity project. It'll help me cutting the refactor down anyway.

Oct 17 '24 16:10 b4nst

Ok I think I can validate the behaviour with that example and export. Basically by adding ts: timeSpan: end: 100.0 to break the cycle, both evalv2 and v3 show similar performances:

env CUE_EXPERIMENT=evalv3
exec time -l cue export -e ts

-- main.cue --
package p

#TimeSpan: {
	start:    float
	duration: float
	end:      float

	start:    end - duration
	duration: end - start
	end:      start + duration

	start: <=end
}

#Task: {
	action:   string
	timeSpan: #TimeSpan
}

#TaskSet: {
	action: string
	#tasks: [string]: #Task
	tasks: [for _, v in #tasks {v}]
	tasks: [...#Task]
	timeSpan?: #TimeSpan
}

#T1: #Task & {
	action: "Task 1"
	timeSpan: #TimeSpan & {
		duration: 4.2
	}
}

#T2: #Task & {
	action: "Task 2"
	timeSpan: #TimeSpan & {
		duration: 120.0
	}
}

ts: #TaskSet
ts: this={
	action: "Task 1 and Task 2"
	#tasks: {
		t1: #T1 & {
			timeSpan: end: this.#tasks.t2.timeSpan.start
		}
		t2: #T2 & {
			timeSpan: end: this.timeSpan.end
		}
	}
	timeSpan: start: this.#tasks.t1.timeSpan.start
}

ts: timeSpan: end: 100.0

0.02 real         0.02 user         0.00 sys
            21364736  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1569  page reclaims
                   3  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                  33  signals received
                  39  voluntary context switches
                 222  involuntary context switches
           153678554  instructions retired
            62430448  cycles elapsed
            11503104  peak memory footprint

Our issue comes when there is a cycle error. evalv2 can detect it, although showing significantly worst performances than without the error:

ts: cycle error
       0.50 real         0.62 user         0.02 sys
           52264960  maximum resident set size
                  0  average shared memory size
                  0  average unshared data size
                  0  average unshared stack size
               3459  page reclaims
                  2  page faults
                  0  swaps
                  0  block input operations
                  0  block output operations
                  0  messages sent
                  0  messages received
                707  signals received
                544  voluntary context switches
               2150  involuntary context switches
         9502462162  instructions retired
         2105547898  cycles elapsed
           39781952  peak memory footprint

With evalv3 on the other hand, I can't get it to raise the cycle error. The process starts eating all my memory.

So our problem can be summarized as:

evalv2:
- nominal execution can take a lot of resources (time + memory) especially when there's a lot of nested cycles in our configuration. This example being super simple, it's not really representative of that performance gap.
- cycle error, when happening, are hard to track. No stack to trace to the actual definition raising the cycle error makes it often hard for us to identify the configuration or user input that caused it. Add on top of that preference markers and float operations in user inputs, and it quickly becomes an actual nightmare
evalv3:
- we experienced massive performance gain in nominal situations with nested cycles, on dedicated examples (I can give some, but I bet you already have ton of them). However when applied to our code base (currently successfully resolving using 0.4.3), it goes OOM. I do suspect a cycle that 0.4.3 was somehow able to break, but it's hard to troubleshoot.
- When a cycle error happens, it seems it cannot manage to get to the result (= cycle error). This simple example is enough to demonstrate it. This makes upgrade hard to happen, as we're not able to easily identify what's currently causing it on our code base.

On another note, our legal team is currently looking into the unity project policies. So hopefully we'll soon be allowed to share more concrete examples of those situations.

Thanks again for all your help, it is really aprecciated!

Oct 18 '24 10:10 b4nst

Hey, just for information I can reproduce the same error with v0.11.0-alpha.4 and evalv3

Oct 28 '24 16:10 b4nst

Hey @mvdan 👋 , I am a colleague of b4nst. This is just a follow-up on our legal team investigation regarding Unity: we cannot share more concrete examples, unfortunately. This example provides a concrete example nevertheless of this issue, we will keep you posted on our own check on the next releases, and this remains a major blocker on our side to migrate part of our codebase from 0.4.3.

Dec 04 '24 11:12 thibaut-lo

Thank you for trying and for following up. We'll still try to investigate and resolve this issue as soon as we are able to. We are aware of other similar issues, such as https://github.com/cue-lang/cue/issues/3610, which we reduced from a codebase of our own.

Dec 04 '24 11:12 mvdan

I noticed the counters (number of "CUE ops") were low, so it was not an issue with the CUE algorithm. With a bit of profiling, this seems to be a problem with appendToList which is only used for errors.

Changing "cue/errors".Append to simply return a or b, instead of creating a list, reduces the running time to 0.3s again. Not sure why, still have to to investigate.

Dec 08 '24 13:12 mpvl

evalv3 performance on cycles

What version of CUE are you using (cue version)?

Does this issue reproduce with the latest stable release?

What did you do?

What did you expect to see?

What did you see instead?

Notes

What version of CUE are you using (`cue version`)?