cue cue field comprehension (list => struct) execution time is quadratic with the list size

What version of CUE are you using (`cue version`)?

$ cue version
cue version v0.6.0

go version go1.21.0
      -buildmode exe
       -compiler gc
  DefaultGODEBUG panicnil=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS darwin
         GOAMD64 v1

Does this issue reproduce with the latest stable release?

Yes

What did you do?

Run the attached txtar file thus: testscript -v test-comprehension.txtar

What did you expect to see?

Execution times proportional to the data size

What did you see instead?

Execution times roughly proportional to the square of the data size

> go run main.go -size=200
[stdout]
Generated test data of size 200 and wrote it to file data/data.json

# exec head data/data.json (39.636s)
> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
        0.10 real         0.04 user         0.01 sys
> go run main.go -size=2000
[stdout]
Generated test data of size 2000 and wrote it to file data/data.json

> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
        0.41 real         0.44 user         0.02 sys
> go run main.go -size=20000
[stdout]
Generated test data of size 20000 and wrote it to file data/data.json

> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
       37.96 real        38.19 user         0.25 sys
PASS

test-comprehension.txtar.txt

Sep 15 '23 15:09 rudifa

Details:

Script test-comprehension.txtar generates data files of specified list size similar to

{
  "items": [
    {
      "id": "acf48c8b-4084-43b2-a3cc-29061581f9f7",
      "value": "9cb42b6d-0fc6-4f1e-a329-0d7b46cb1ad8"
    },
    {
      "id": "5a33bca6-68c4-4eb4-afc9-6ce6ea92b45b",
      "value": "bb47a97b-fad1-43b9-b8fb-8769c0d1e4bc"
    },
    {
      "id": "a346e657-b754-48fc-9b55-1ad79fd7c3b9",
      "value": "140c9602-da45-461c-8735-122347f1d057"
    }
  ]
}

and uses the schema

items: [
	{
		id:    string
		value: string
	},
	...,
]

#dict: {
	for item in items {
		"\( item.id )": {value: item.value}
	}
}

to convert data to a struct

"acf48c8b-4084-43b2-a3cc-29061581f9f7": {
    value: "9cb42b6d-0fc6-4f1e-a329-0d7b46cb1ad8"
}
"5a33bca6-68c4-4eb4-afc9-6ce6ea92b45b": {
    value: "bb47a97b-fad1-43b9-b8fb-8769c0d1e4bc"
}
"a346e657-b754-48fc-9b55-1ad79fd7c3b9": {
    value: "140c9602-da45-461c-8735-122347f1d057"
}

Sep 15 '23 15:09 rudifa

Running the test-comprehension.txtar.txt script with cue v0.9.0-alpha.5 still demonstrates the execution time roughly proportional to the square of the data length when running this cue schema

items: [
	{
		id:    string
		value: string
	},
	...,
]

#dict: {
	for item in items {
		"\( item.id )": {value: item.value}
	}
}

testscript -v test-comprehension.txtar               
WORK=$WORK
...
# exec cat data/schema.cue (7.166s)
> go mod tidy
[stderr]
go: finding module for package github.com/google/uuid
go: downloading github.com/google/uuid v1.6.0
go: found github.com/google/uuid in github.com/google/uuid v1.6.0

> go run main.go -size=200
[stdout]
Generated test data of size 200 and wrote it to file data/data.json

# exec head data/data.json (40.951s)
> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
        0.05 real         0.04 user         0.01 sys
> go run main.go -size=2000
[stdout]
Generated test data of size 2000 and wrote it to file data/data.json

> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
        0.38 real         0.41 user         0.02 sys
> go run main.go -size=20000
[stdout]
Generated test data of size 20000 and wrote it to file data/data.json

> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
       39.11 real        39.37 user         0.27 sys
> exec cue version
[stdout]
cue version v0.0.0-20240517091053-05453ff5230f

go version go1.22.2
      -buildmode exe
       -compiler gc
  DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS darwin
         GOAMD64 v1
             vcs git
    vcs.revision 05453ff5230f212e41da5969ea7196f1a54d16b1
        vcs.time 2024-05-17T09:10:53Z
    vcs.modified true
cue.lang.version v0.9.0
PASS

May 19 '24 10:05 rudifa

Have you been testing the new evaluator with CUE_EXPERIMENT=evalv3? All performance work is going to that new version.

May 19 '24 12:05 mvdan

@mvdan Thank you for reminding me to use CUE_EXPERIMENT=evalv3.

Here, the revised txtar script generates more data points, and a python script graphs the results.

With CUE_EXPERIMENT=evalv3 the execution is almost 8 times faster (orange curves).
However, the shape of the data Size vs. Real Time curve is still close to quadratic.

graph-loglog

graph-lin

Here are the scripts and their use to create above plots:

test-comprehension-090.txtar.txt graph.py.txt

 % testscript -v test-comprehension-090.txtar > before-evalv3.txt
 % testscript -v -e CUE_EXPERIMENT=evalv3 test-comprehension-090.txtar > with-evalv3.txt
 % graph.py -input before-evalv3.txt with-evalv3.txt -out graph-loglog.svg -loglog
 % graph.py -input before-evalv3.txt with-evalv3.txt -out graph-lin.svg

May 20 '24 13:05 rudifa

What do the blue and orange lines stand for? Are those the old and new evaluator?

May 20 '24 13:05 mvdan

Yes. orange: with CUE_EXPERIMENT=evalv3, blue: old evaluator.

May 20 '24 14:05 rudifa

cue cue copied to clipboard

cue field comprehension (list => struct) execution time is quadratic with the list size

What version of CUE are you using (cue version)?

Does this issue reproduce with the latest stable release?

What did you do?

What did you expect to see?

What did you see instead?

cue
cue copied to clipboard

What version of CUE are you using (`cue version`)?