go-cmp
go-cmp copied to clipboard
Diff is extremely slow
I tried to use cmp.Diff in a unit test, but for my data, the function is so slow that the unit tests time out. The data in question are two decoded versions of the "Go Regular" font (as described in the Go blog ). The comparison takes more than 16 seconds on a fast laptop.
Here is code to reproduce the problem:
package main
import (
"fmt"
"log"
"github.com/google/go-cmp/cmp"
"seehuhn.de/go/sfnt/glyph"
"seehuhn.de/go/pdf"
"seehuhn.de/go/pdf/font"
"seehuhn.de/go/pdf/font/charcode"
"seehuhn.de/go/pdf/font/gofont"
"seehuhn.de/go/pdf/font/opentype"
)
func main() {
otf, err := gofont.OpenType(gofont.GoRegular)
if err != nil {
log.Fatal(err)
}
encoding := make([]glyph.ID, 256)
encoding[65] = otf.CMap.Lookup('A')
encoding[66] = otf.CMap.Lookup('C')
toUnicode := map[charcode.CharCode][]rune{
65: {'A'},
66: {'C'},
}
info1 := &opentype.EmbedInfoSimpleCFF{
Font: otf,
SubsetTag: "UVWXYZ",
Encoding: encoding,
ToUnicode: toUnicode,
}
rw := pdf.NewData(pdf.V1_7)
ref := rw.Alloc()
err = info1.Embed(rw, ref)
if err != nil {
log.Fatal(err)
}
dicts, err := font.ExtractDicts(rw, ref)
if err != nil {
log.Fatal(err)
}
info2, err := opentype.Extract(rw, dicts)
if err != nil {
log.Fatal(err)
}
d := cmp.Diff(info1, info2)
fmt.Println(d)
}
The following go.mod
file shows the exact versions I used:
module seehuhn.de/go/issues/001
go 1.21.0
require (
github.com/google/go-cmp v0.5.9
seehuhn.de/go/pdf v0.3.5-0.20230815144317-1dc53998ea9b
seehuhn.de/go/sfnt v0.3.5-0.20230814123027-5af1767be82b
)
require (
github.com/xdg-go/stringprep v1.0.4 // indirect
golang.org/x/exp v0.0.0-20230713183714-613f0c0eb8a1 // indirect
golang.org/x/image v0.7.0 // indirect
golang.org/x/text v0.9.0 // indirect
seehuhn.de/go/dag v0.0.0-20230612165854-b02059e84ec5 // indirect
seehuhn.de/go/dijkstra v0.9.3 // indirect
seehuhn.de/go/postscript v0.3.5-0.20230811095027-86e1857cf919 // indirect
)
Since the font data is of moderate size (a few hundred glyphs, each containing a few tens of control points), I would expect the comparison to be much faster than 16 seconds.
Thank you for the bug report and reproduction. This is a good bug report.
It's known that there are a few edge-cases that cause cmp
to have pathological behavior (e.g., excessive branching and re-checking of results in certain nested structures). I was able to reproduce a execution time of 25s on my Ryzen 5900x. I won't have time to dig into what's going on in the near future, but can hopefully can fix this.
I've worked around slowness in cmp.Diff (with no cmpopts applied) by checking reflect.DeepEqual first; in the 5-ish times I've done this I've found it to always be correct and performant. Would it be reasonable to include that in the implementation of cmp.Diff? i.e. if reflect.DeepEqual returns true, have a fast-path return ""
, else do the full diffing that callers expect.
The semantics of reflect.DeepEqual
is not identical to cmp.Equal
even without any options specified, so that approach won't work in the general case (it might work for your use case). What we can do though is to analyze the set of options passed in and figure out exactly what information needs to be tracked and avoid doing that if it's unnecessary. For example, tracking the entire cmp.Path
is unnecessary if cmp.FilterPath
is not used.