cue icon indicating copy to clipboard operation
cue copied to clipboard

cue field comprehension (list => struct) execution time is quadratic with the list size

Open rudifa opened this issue 2 years ago • 6 comments

What version of CUE are you using (cue version)?

$ cue version
cue version v0.6.0

go version go1.21.0
      -buildmode exe
       -compiler gc
  DefaultGODEBUG panicnil=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS darwin
         GOAMD64 v1

Does this issue reproduce with the latest stable release?

Yes

What did you do?

Run the attached txtar file thus: testscript -v test-comprehension.txtar

What did you expect to see?

Execution times proportional to the data size

What did you see instead?

Execution times roughly proportional to the square of the data size

> go run main.go -size=200
[stdout]
Generated test data of size 200 and wrote it to file data/data.json

# exec head data/data.json (39.636s)
> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
        0.10 real         0.04 user         0.01 sys
> go run main.go -size=2000
[stdout]
Generated test data of size 2000 and wrote it to file data/data.json

> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
        0.41 real         0.44 user         0.02 sys
> go run main.go -size=20000
[stdout]
Generated test data of size 20000 and wrote it to file data/data.json

> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
       37.96 real        38.19 user         0.25 sys
PASS

test-comprehension.txtar.txt

rudifa avatar Sep 15 '23 15:09 rudifa

Details:

Script test-comprehension.txtar generates data files of specified list size similar to

{
  "items": [
    {
      "id": "acf48c8b-4084-43b2-a3cc-29061581f9f7",
      "value": "9cb42b6d-0fc6-4f1e-a329-0d7b46cb1ad8"
    },
    {
      "id": "5a33bca6-68c4-4eb4-afc9-6ce6ea92b45b",
      "value": "bb47a97b-fad1-43b9-b8fb-8769c0d1e4bc"
    },
    {
      "id": "a346e657-b754-48fc-9b55-1ad79fd7c3b9",
      "value": "140c9602-da45-461c-8735-122347f1d057"
    }
  ]
}

and uses the schema

items: [
	{
		id:    string
		value: string
	},
	...,
]

#dict: {
	for item in items {
		"\( item.id )": {value: item.value}
	}
}

to convert data to a struct

"acf48c8b-4084-43b2-a3cc-29061581f9f7": {
    value: "9cb42b6d-0fc6-4f1e-a329-0d7b46cb1ad8"
}
"5a33bca6-68c4-4eb4-afc9-6ce6ea92b45b": {
    value: "bb47a97b-fad1-43b9-b8fb-8769c0d1e4bc"
}
"a346e657-b754-48fc-9b55-1ad79fd7c3b9": {
    value: "140c9602-da45-461c-8735-122347f1d057"
}

rudifa avatar Sep 15 '23 15:09 rudifa

Running the test-comprehension.txtar.txt script with cue v0.9.0-alpha.5 still demonstrates the execution time roughly proportional to the square of the data length when running this cue schema

items: [
	{
		id:    string
		value: string
	},
	...,
]

#dict: {
	for item in items {
		"\( item.id )": {value: item.value}
	}
}
testscript -v test-comprehension.txtar               
WORK=$WORK
...
# exec cat data/schema.cue (7.166s)
> go mod tidy
[stderr]
go: finding module for package github.com/google/uuid
go: downloading github.com/google/uuid v1.6.0
go: found github.com/google/uuid in github.com/google/uuid v1.6.0

> go run main.go -size=200
[stdout]
Generated test data of size 200 and wrote it to file data/data.json

# exec head data/data.json (40.951s)
> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
        0.05 real         0.04 user         0.01 sys
> go run main.go -size=2000
[stdout]
Generated test data of size 2000 and wrote it to file data/data.json

> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
        0.38 real         0.41 user         0.02 sys
> go run main.go -size=20000
[stdout]
Generated test data of size 20000 and wrote it to file data/data.json

> exec time cue eval data/schema.cue data/data.json -e '#dict' -f -o data.cue
[stderr]
       39.11 real        39.37 user         0.27 sys
> exec cue version
[stdout]
cue version v0.0.0-20240517091053-05453ff5230f

go version go1.22.2
      -buildmode exe
       -compiler gc
  DefaultGODEBUG httplaxcontentlength=1,httpmuxgo121=1,tls10server=1,tlsrsakex=1,tlsunsafeekm=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS darwin
         GOAMD64 v1
             vcs git
    vcs.revision 05453ff5230f212e41da5969ea7196f1a54d16b1
        vcs.time 2024-05-17T09:10:53Z
    vcs.modified true
cue.lang.version v0.9.0
PASS

rudifa avatar May 19 '24 10:05 rudifa

Have you been testing the new evaluator with CUE_EXPERIMENT=evalv3? All performance work is going to that new version.

mvdan avatar May 19 '24 12:05 mvdan

@mvdan Thank you for reminding me to use CUE_EXPERIMENT=evalv3.

Here, the revised txtar script generates more data points, and a python script graphs the results.

  • With CUE_EXPERIMENT=evalv3 the execution is almost 8 times faster (orange curves).

  • However, the shape of the data Size vs. Real Time curve is still close to quadratic.

graph-loglog

graph-lin

Here are the scripts and their use to create above plots:

test-comprehension-090.txtar.txt graph.py.txt

 % testscript -v test-comprehension-090.txtar > before-evalv3.txt
 % testscript -v -e CUE_EXPERIMENT=evalv3 test-comprehension-090.txtar > with-evalv3.txt
 % graph.py -input before-evalv3.txt with-evalv3.txt -out graph-loglog.svg -loglog
 % graph.py -input before-evalv3.txt with-evalv3.txt -out graph-lin.svg 

rudifa avatar May 20 '24 13:05 rudifa

What do the blue and orange lines stand for? Are those the old and new evaluator?

mvdan avatar May 20 '24 13:05 mvdan

Yes. orange: with CUE_EXPERIMENT=evalv3, blue: old evaluator.

rudifa avatar May 20 '24 14:05 rudifa