Converting between string and []rune takes a long time
Consider the program below. The program runs slowly. Part of it is due to encoding and decoding between strings and runes. Roughly 1860ms out of 4050ms is spent doing this:
(pprof) top
4050ms of 9270ms total (43.69%)
Dropped 87 nodes (cum <= 46.35ms)
Showing top 10 nodes out of 138 (cum >= 580ms)
flat flat% sum% cum cum%
630ms 6.80% 6.80% 630ms 6.80% runtime.encoderune
600ms 6.47% 13.27% 2250ms 24.27% github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).diffBisect
560ms 6.04% 19.31% 1230ms 13.27% runtime.slicerunetostring
400ms 4.31% 23.62% 500ms 5.39% runtime.semrelease
360ms 3.88% 27.51% 1120ms 12.08% runtime.pcvalue
310ms 3.34% 30.85% 310ms 3.34% runtime.readvarint
310ms 3.34% 34.20% 620ms 6.69% runtime.step
300ms 3.24% 37.43% 300ms 3.24% github.com/sergi/go-diff/diffmatchpatch.runesEqual
300ms 3.24% 40.67% 410ms 4.42% runtime.lock
280ms 3.02% 43.69% 580ms 6.26% github.com/sergi/go-diff/diffmatchpatch.runesIndex
It should be possible to work on one representation during the algorithm to avoid this overhead.
(Note: This has non-deterministic behavior that was reported in #75.)
package main
import (
"flag"
"fmt"
"log"
"os"
"runtime/pprof"
"sync"
"sync/atomic"
"github.com/sergi/go-diff/diffmatchpatch"
)
var (
cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
dmp = diffmatchpatch.New()
)
const (
num = 50000
expect = "[{1 licensed } {0 under the apache license, version 2.0 (the} {-1 #} {0 'license'); you may not use this file except in compliance } {-1 # } {0 with the license. you may obtain a copy of the license at } {-1 # # } {0 http://www.apache.org/licenses/license-2.0 } {-1 # # } {0 unless required by applicable law or agreed to in writing, } {-1 # } {0 software distributed under the license is distributed on an} {-1 #} {0 'as is'basis, without warranties or conditions of any} {-1 #} {0 kind, either express or implied. see the license for the } {-1 # } {0 specific language governing permissions and limitations} {-1 #} {0 under the license.}]"
unknown = "under the apache license, version 2.0 (the #'license'); you may not use this file except in compliance # with the license. you may obtain a copy of the license at # # http://www.apache.org/licenses/license-2.0 # # unless required by applicable law or agreed to in writing, # software distributed under the license is distributed on an #'as is'basis, without warranties or conditions of any # kind, either express or implied. see the license for the # specific language governing permissions and limitations # under the license."
known = "licensed under the apache license, version 2.0 (the'license'); you may not use this file except in compliance with the license. you may obtain a copy of the license at http://www.apache.org/licenses/license-2.0 unless required by applicable law or agreed to in writing, software distributed under the license is distributed on an'as is'basis, without warranties or conditions of any kind, either express or implied. see the license for the specific language governing permissions and limitations under the license."
)
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
}
var matched, missed int32
var wg sync.WaitGroup
wg.Add(num)
for i := 0; i < num; i++ {
go func(i int) {
defer wg.Done()
diffs := dmp.DiffMain(unknown, known, false)
s := fmt.Sprintf("%v", diffs)
if s != expect {
//fmt.Fprintf(os.Stderr, "MISMATCH(%d):\n%s\n", i, s)
atomic.AddInt32(&missed, 1)
} else {
atomic.AddInt32(&matched, 1)
}
}(i)
}
wg.Wait()
fmt.Fprintf(os.Stderr, "NUMBER MATCHING: %d\n", matched)
fmt.Fprintf(os.Stderr, "NUMBER MISMATCHING: %d\n", missed)
}
It is true that the code code be optimized a lot, woud you like to work on that? I can reproduce your profile with your example. However, if i set the the diff timeout (I mentioned that in your other issue) the profile changes to the following. (btw Which version of Go and architecture are you using?)
Go 1.7.1 (amd64)
7760ms of 14340ms total (54.11%)
Dropped 115 nodes (cum <= 71.70ms)
Showing top 10 nodes out of 116 (cum >= 800ms)
flat flat% sum% cum cum%
1400ms 9.76% 9.76% 4360ms 30.40% github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).diffBisect
1210ms 8.44% 18.20% 1210ms 8.44% runtime.readvarint
1010ms 7.04% 25.24% 3400ms 23.71% runtime.pcvalue
930ms 6.49% 31.73% 6640ms 46.30% runtime.gentraceback
770ms 5.37% 37.10% 1980ms 13.81% runtime.step
640ms 4.46% 41.56% 1300ms 9.07% runtime.scanblock
620ms 4.32% 45.89% 620ms 4.32% runtime.heapBitsForObject
400ms 2.79% 48.68% 2130ms 14.85% runtime.mallocgc
390ms 2.72% 51.39% 580ms 4.04% runtime.scanobject
390ms 2.72% 54.11% 800ms 5.58% runtime.stringtoslicerune
and Go 1.8.1 (amd64)
5940ms of 11260ms total (52.75%)
Dropped 86 nodes (cum <= 56.30ms)
Showing top 10 nodes out of 122 (cum >= 270ms)
flat flat% sum% cum cum%
1520ms 13.50% 13.50% 5570ms 49.47% github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).diffBisect
720ms 6.39% 19.89% 2080ms 18.47% runtime.pcvalue
720ms 6.39% 26.29% 720ms 6.39% runtime.readvarint
550ms 4.88% 31.17% 3520ms 31.26% runtime.gentraceback
550ms 4.88% 36.06% 3380ms 30.02% runtime.mallocgc
450ms 4.00% 40.05% 540ms 4.80% runtime.scanobject
450ms 4.00% 44.05% 1170ms 10.39% runtime.step
440ms 3.91% 47.96% 440ms 3.91% runtime.memclrNoHeapPointers
270ms 2.40% 50.36% 580ms 5.15% runtime.slicerunetostring
270ms 2.40% 52.75% 270ms 2.40% runtime.usleep
I think the version of Go is 1.8 on an amd64 machine. I can try to work on performance. If you have any points that would be a good place to start, please let me know. :-)
You can find out which architecture you are on using go env.
I would suggest to first add more benchmarks. We already have some basic benchmarks but it is sometimes hard to compare the impact of changes with them. The most interesting function as a user (as you might have guessed) is "DiffMain" which uses almost everything of the diff code. @maksimov is working on "commonSuffixLength" (and maybe the prefix version too) which looks promising.
EDIT: I forgot that the profiles in this issue are for your example (which is goroutine heavy) and not for our current benchmarks. So here is another one (which also includes DiffMain):
$ go test -cpuprofile=out.prof -bench=.
BenchmarkDiffCommonPrefix-4 2000000 609 ns/op
BenchmarkDiffCommonSuffix-4 3000000 596 ns/op
BenchmarkDiffHalfMatch-4 3000 416420 ns/op
BenchmarkDiffCleanupSemantic-4 50 26762635 ns/op
BenchmarkDiffMain-4 1 1026348830 ns/op
BenchmarkDiffMainLarge-4 5 240501544 ns/op
BenchmarkDiffMainRunesLargeLines-4 1000 1117465 ns/op
PASS
ok github.com/sergi/go-diff/diffmatchpatch 11.980s
$ go tool pprof diffmatchpatch.test out.prof
Entering interactive mode (type "help" for commands)
(pprof) top
10340ms of 12720ms total (81.29%)
Dropped 112 nodes (cum <= 63.60ms)
Showing top 10 nodes out of 83 (cum >= 410ms)
flat flat% sum% cum cum%
3530ms 27.75% 27.75% 4000ms 31.45% github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).diffBisect
2710ms 21.31% 49.06% 5070ms 39.86% runtime.stringtoslicerune
2270ms 17.85% 66.90% 2270ms 17.85% runtime.charntorune
360ms 2.83% 69.73% 440ms 3.46% runtime.scanobject
350ms 2.75% 72.48% 350ms 2.75% runtime.memmove
290ms 2.28% 74.76% 290ms 2.28% runtime.heapBitsSetType
210ms 1.65% 76.42% 210ms 1.65% github.com/sergi/go-diff/diffmatchpatch.commonSuffixLength
210ms 1.65% 78.07% 210ms 1.65% github.com/sergi/go-diff/diffmatchpatch.runesEqual
210ms 1.65% 79.72% 210ms 1.65% runtime.memclr
200ms 1.57% 81.29% 410ms 3.22% github.com/sergi/go-diff/diffmatchpatch.runesIndex
(pprof)
This profile suggests "diffBisect" and the string<->rune conversion.
My "go env" is this, but we run the build in parallel on many different machines and the version of go that uses might not be the same as on my desktop.
go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/usr/local/google/home/morbo/go"
GORACE=""
GOROOT="/usr/lib/google-golang"
GOTOOLDIR="/usr/lib/google-golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build628312538=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"