cue
cue copied to clipboard
cue field comprehension (list => struct) execution time is quadratic with the list size
What version of CUE are you using (cue version)?
$ cue version
cue version v0.6.0
go version go1.21.0
-buildmode exe
-compiler gc
DefaultGODEBUG panicnil=1
CGO_ENABLED 1
GOARCH amd64
GOOS darwin
GOAMD64 v1
Does this issue reproduce with the latest stable release?
Yes
What did you do?
Run the attached txtar file thus: testscript -v test-comprehension.txtar
What did you expect to see?
Execution times proportional to the data size
What did you see instead?
Execution times roughly proportional to the square of the data size
> go run main.go -size=200
[stdout]
Generated test data of size 200 and wrote it to file data/data.json
# exec head data/data.json (39.636s)
> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
0.10 real 0.04 user 0.01 sys
> go run main.go -size=2000
[stdout]
Generated test data of size 2000 and wrote it to file data/data.json
> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
0.41 real 0.44 user 0.02 sys
> go run main.go -size=20000
[stdout]
Generated test data of size 20000 and wrote it to file data/data.json
> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
37.96 real 38.19 user 0.25 sys
PASS
Details:
Script test-comprehension.txtar generates data files of specified list size similar to
{
"items": [
{
"id": "acf48c8b-4084-43b2-a3cc-29061581f9f7",
"value": "9cb42b6d-0fc6-4f1e-a329-0d7b46cb1ad8"
},
{
"id": "5a33bca6-68c4-4eb4-afc9-6ce6ea92b45b",
"value": "bb47a97b-fad1-43b9-b8fb-8769c0d1e4bc"
},
{
"id": "a346e657-b754-48fc-9b55-1ad79fd7c3b9",
"value": "140c9602-da45-461c-8735-122347f1d057"
}
]
}
and uses the schema
items: [
{
id: string
value: string
},
...,
]
#dict: {
for item in items {
"\( item.id )": {value: item.value}
}
}
to convert data to a struct
"acf48c8b-4084-43b2-a3cc-29061581f9f7": {
value: "9cb42b6d-0fc6-4f1e-a329-0d7b46cb1ad8"
}
"5a33bca6-68c4-4eb4-afc9-6ce6ea92b45b": {
value: "bb47a97b-fad1-43b9-b8fb-8769c0d1e4bc"
}
"a346e657-b754-48fc-9b55-1ad79fd7c3b9": {
value: "140c9602-da45-461c-8735-122347f1d057"
}
Running the test-comprehension.txtar.txt script with cue v0.9.0-alpha.5 still demonstrates the execution time roughly proportional to the square of the data length when running this cue schema
items: [
{
id: string
value: string
},
...,
]
#dict: {
for item in items {
"\( item.id )": {value: item.value}
}
}
testscript -v test-comprehension.txtar
WORK=$WORK
...
# exec cat data/schema.cue (7.166s)
> go mod tidy
[stderr]
go: finding module for package github.com/google/uuid
go: downloading github.com/google/uuid v1.6.0
go: found github.com/google/uuid in github.com/google/uuid v1.6.0
> go run main.go -size=200
[stdout]
Generated test data of size 200 and wrote it to file data/data.json
# exec head data/data.json (40.951s)
> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
0.05 real 0.04 user 0.01 sys
> go run main.go -size=2000
[stdout]
Generated test data of size 2000 and wrote it to file data/data.json
> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
0.38 real 0.41 user 0.02 sys
> go run main.go -size=20000
[stdout]
Generated test data of size 20000 and wrote it to file data/data.json
> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
39.11 real 39.37 user 0.27 sys
> exec cue version
[stdout]
cue version v0.0.0-20240517091053-05453ff5230f
go version go1.22.2
-buildmode exe
-compiler gc
DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
CGO_ENABLED 1
GOARCH amd64
GOOS darwin
GOAMD64 v1
vcs git
vcs.revision 05453ff5230f212e41da5969ea7196f1a54d16b1
vcs.time 2024-05-17T09:10:53Z
vcs.modified true
cue.lang.version v0.9.0
PASS
Have you been testing the new evaluator with CUE_EXPERIMENT=evalv3? All performance work is going to that new version.
@mvdan Thank you for reminding me to use CUE_EXPERIMENT=evalv3.
Here, the revised txtar script generates more data points, and a python script graphs the results.
-
With
CUE_EXPERIMENT=evalv3the execution is almost 8 times faster (orange curves). -
However, the shape of the data
Sizevs.Real Timecurve is still close to quadratic.
Here are the scripts and their use to create above plots:
test-comprehension-090.txtar.txt graph.py.txt
% testscript -v test-comprehension-090.txtar > before-evalv3.txt
% testscript -v -e CUE_EXPERIMENT=evalv3 test-comprehension-090.txtar > with-evalv3.txt
% graph.py -input before-evalv3.txt with-evalv3.txt -out graph-loglog.svg -loglog
% graph.py -input before-evalv3.txt with-evalv3.txt -out graph-lin.svg
What do the blue and orange lines stand for? Are those the old and new evaluator?
Yes. orange: with CUE_EXPERIMENT=evalv3, blue: old evaluator.