go-cmp icon indicating copy to clipboard operation
go-cmp copied to clipboard

Diff is extremely slow

Open seehuhn opened this issue 1 year ago • 3 comments

I tried to use cmp.Diff in a unit test, but for my data, the function is so slow that the unit tests time out. The data in question are two decoded versions of the "Go Regular" font (as described in the Go blog ). The comparison takes more than 16 seconds on a fast laptop.

Here is code to reproduce the problem:

package main

import (
	"fmt"
	"log"

	"github.com/google/go-cmp/cmp"

	"seehuhn.de/go/sfnt/glyph"

	"seehuhn.de/go/pdf"
	"seehuhn.de/go/pdf/font"
	"seehuhn.de/go/pdf/font/charcode"
	"seehuhn.de/go/pdf/font/gofont"
	"seehuhn.de/go/pdf/font/opentype"
)

func main() {
	otf, err := gofont.OpenType(gofont.GoRegular)
	if err != nil {
		log.Fatal(err)
	}

	encoding := make([]glyph.ID, 256)
	encoding[65] = otf.CMap.Lookup('A')
	encoding[66] = otf.CMap.Lookup('C')

	toUnicode := map[charcode.CharCode][]rune{
		65: {'A'},
		66: {'C'},
	}

	info1 := &opentype.EmbedInfoSimpleCFF{
		Font:      otf,
		SubsetTag: "UVWXYZ",
		Encoding:  encoding,
		ToUnicode: toUnicode,
	}

	rw := pdf.NewData(pdf.V1_7)
	ref := rw.Alloc()
	err = info1.Embed(rw, ref)
	if err != nil {
		log.Fatal(err)
	}

	dicts, err := font.ExtractDicts(rw, ref)
	if err != nil {
		log.Fatal(err)
	}
	info2, err := opentype.Extract(rw, dicts)
	if err != nil {
		log.Fatal(err)
	}

	d := cmp.Diff(info1, info2)
	fmt.Println(d)
}

The following go.mod file shows the exact versions I used:

module seehuhn.de/go/issues/001

go 1.21.0

require (
	github.com/google/go-cmp v0.5.9
	seehuhn.de/go/pdf v0.3.5-0.20230815144317-1dc53998ea9b
	seehuhn.de/go/sfnt v0.3.5-0.20230814123027-5af1767be82b
)

require (
	github.com/xdg-go/stringprep v1.0.4 // indirect
	golang.org/x/exp v0.0.0-20230713183714-613f0c0eb8a1 // indirect
	golang.org/x/image v0.7.0 // indirect
	golang.org/x/text v0.9.0 // indirect
	seehuhn.de/go/dag v0.0.0-20230612165854-b02059e84ec5 // indirect
	seehuhn.de/go/dijkstra v0.9.3 // indirect
	seehuhn.de/go/postscript v0.3.5-0.20230811095027-86e1857cf919 // indirect
)

Since the font data is of moderate size (a few hundred glyphs, each containing a few tens of control points), I would expect the comparison to be much faster than 16 seconds.

seehuhn avatar Aug 15 '23 15:08 seehuhn

Thank you for the bug report and reproduction. This is a good bug report.

It's known that there are a few edge-cases that cause cmp to have pathological behavior (e.g., excessive branching and re-checking of results in certain nested structures). I was able to reproduce a execution time of 25s on my Ryzen 5900x. I won't have time to dig into what's going on in the near future, but can hopefully can fix this.

dsnet avatar Aug 15 '23 16:08 dsnet

I've worked around slowness in cmp.Diff (with no cmpopts applied) by checking reflect.DeepEqual first; in the 5-ish times I've done this I've found it to always be correct and performant. Would it be reasonable to include that in the implementation of cmp.Diff? i.e. if reflect.DeepEqual returns true, have a fast-path return "", else do the full diffing that callers expect.

Carrotman42 avatar Jan 30 '24 21:01 Carrotman42

The semantics of reflect.DeepEqual is not identical to cmp.Equal even without any options specified, so that approach won't work in the general case (it might work for your use case). What we can do though is to analyze the set of options passed in and figure out exactly what information needs to be tracked and avoid doing that if it's unnecessary. For example, tracking the entire cmp.Path is unnecessary if cmp.FilterPath is not used.

dsnet avatar Feb 01 '24 00:02 dsnet