go-diff icon indicating copy to clipboard operation
go-diff copied to clipboard

Converting between string and []rune takes a long time

Open bwendling opened this issue 8 years ago • 4 comments

Consider the program below. The program runs slowly. Part of it is due to encoding and decoding between strings and runes. Roughly 1860ms out of 4050ms is spent doing this:

(pprof) top
4050ms of 9270ms total (43.69%)
Dropped 87 nodes (cum <= 46.35ms)
Showing top 10 nodes out of 138 (cum >= 580ms)
      flat  flat%   sum%        cum   cum%
     630ms  6.80%  6.80%      630ms  6.80%  runtime.encoderune
     600ms  6.47% 13.27%     2250ms 24.27%  github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).diffBisect
     560ms  6.04% 19.31%     1230ms 13.27%  runtime.slicerunetostring
     400ms  4.31% 23.62%      500ms  5.39%  runtime.semrelease
     360ms  3.88% 27.51%     1120ms 12.08%  runtime.pcvalue
     310ms  3.34% 30.85%      310ms  3.34%  runtime.readvarint
     310ms  3.34% 34.20%      620ms  6.69%  runtime.step
     300ms  3.24% 37.43%      300ms  3.24%  github.com/sergi/go-diff/diffmatchpatch.runesEqual
     300ms  3.24% 40.67%      410ms  4.42%  runtime.lock
     280ms  3.02% 43.69%      580ms  6.26%  github.com/sergi/go-diff/diffmatchpatch.runesIndex

It should be possible to work on one representation during the algorithm to avoid this overhead.

(Note: This has non-deterministic behavior that was reported in #75.)

package main

import (
        "flag"
        "fmt"
        "log"
        "os"
        "runtime/pprof"
        "sync"
        "sync/atomic"

        "github.com/sergi/go-diff/diffmatchpatch"
)

var (
        cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
        dmp        = diffmatchpatch.New()
)

const (
        num    = 50000
        expect = "[{1 licensed } {0 under the apache license, version 2.0 (the} {-1  #} {0 'license'); you may not use this file except in compliance } {-1 # } {0 with the license. you may obtain a copy of the license at } {-1 # # } {0 http://www.apache.org/licenses/license-2.0 } {-1 # # } {0 unless required by applicable law or agreed to in writing, } {-1 # } {0 software distributed under the license is distributed on an} {-1  #} {0 'as is'basis, without warranties or conditions of any} {-1  #} {0  kind, either express or implied. see the license for the } {-1 # } {0 specific language governing permissions and limitations} {-1  #} {0  under the license.}]"

        unknown = "under the apache license, version 2.0 (the #'license'); you may not use this file except in compliance # with the license. you may obtain a copy of the license at # # http://www.apache.org/licenses/license-2.0 # # unless required by applicable law or agreed to in writing, # software distributed under the license is distributed on an #'as is'basis, without warranties or conditions of any # kind, either express or implied. see the license for the # specific language governing permissions and limitations # under the license."

        known = "licensed under the apache license, version 2.0 (the'license'); you may not use this file except in compliance with the license. you may obtain a copy of the license at http://www.apache.org/licenses/license-2.0 unless required by applicable law or agreed to in writing, software distributed under the license is distributed on an'as is'basis, without warranties or conditions of any kind, either express or implied. see the license for the specific language governing permissions and limitations under the license."
)

func main() {
        flag.Parse()
        if *cpuprofile != "" {
                f, err := os.Create(*cpuprofile)
                if err != nil {
                        log.Fatal(err)
                }
                pprof.StartCPUProfile(f)
                defer pprof.StopCPUProfile()
        }

        var matched, missed int32
        var wg sync.WaitGroup
        wg.Add(num)
        for i := 0; i < num; i++ {
                go func(i int) {
                        defer wg.Done()
                        diffs := dmp.DiffMain(unknown, known, false)
                        s := fmt.Sprintf("%v", diffs)
                        if s != expect {
                                //fmt.Fprintf(os.Stderr, "MISMATCH(%d):\n%s\n", i, s)
                                atomic.AddInt32(&missed, 1)
                        } else {
                                atomic.AddInt32(&matched, 1)
                        }
                }(i)
        }
        wg.Wait()
        fmt.Fprintf(os.Stderr, "NUMBER MATCHING: %d\n", matched)
        fmt.Fprintf(os.Stderr, "NUMBER MISMATCHING: %d\n", missed)
}

bwendling avatar Mar 13 '17 18:03 bwendling

It is true that the code code be optimized a lot, woud you like to work on that? I can reproduce your profile with your example. However, if i set the the diff timeout (I mentioned that in your other issue) the profile changes to the following. (btw Which version of Go and architecture are you using?)

Go 1.7.1 (amd64)

7760ms of 14340ms total (54.11%)
Dropped 115 nodes (cum <= 71.70ms)
Showing top 10 nodes out of 116 (cum >= 800ms)
      flat  flat%   sum%        cum   cum%
    1400ms  9.76%  9.76%     4360ms 30.40%  github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).diffBisect
    1210ms  8.44% 18.20%     1210ms  8.44%  runtime.readvarint
    1010ms  7.04% 25.24%     3400ms 23.71%  runtime.pcvalue
     930ms  6.49% 31.73%     6640ms 46.30%  runtime.gentraceback
     770ms  5.37% 37.10%     1980ms 13.81%  runtime.step
     640ms  4.46% 41.56%     1300ms  9.07%  runtime.scanblock
     620ms  4.32% 45.89%      620ms  4.32%  runtime.heapBitsForObject
     400ms  2.79% 48.68%     2130ms 14.85%  runtime.mallocgc
     390ms  2.72% 51.39%      580ms  4.04%  runtime.scanobject
     390ms  2.72% 54.11%      800ms  5.58%  runtime.stringtoslicerune

and Go 1.8.1 (amd64)

5940ms of 11260ms total (52.75%)                                                                                     
Dropped 86 nodes (cum <= 56.30ms)                                                                                    
Showing top 10 nodes out of 122 (cum >= 270ms)
      flat  flat%   sum%        cum   cum%
    1520ms 13.50% 13.50%     5570ms 49.47%  github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).diffBisect
     720ms  6.39% 19.89%     2080ms 18.47%  runtime.pcvalue
     720ms  6.39% 26.29%      720ms  6.39%  runtime.readvarint
     550ms  4.88% 31.17%     3520ms 31.26%  runtime.gentraceback
     550ms  4.88% 36.06%     3380ms 30.02%  runtime.mallocgc
     450ms  4.00% 40.05%      540ms  4.80%  runtime.scanobject
     450ms  4.00% 44.05%     1170ms 10.39%  runtime.step
     440ms  3.91% 47.96%      440ms  3.91%  runtime.memclrNoHeapPointers
     270ms  2.40% 50.36%      580ms  5.15%  runtime.slicerunetostring
     270ms  2.40% 52.75%      270ms  2.40%  runtime.usleep

zimmski avatar Apr 09 '17 14:04 zimmski

I think the version of Go is 1.8 on an amd64 machine. I can try to work on performance. If you have any points that would be a good place to start, please let me know. :-)

bwendling avatar Apr 10 '17 04:04 bwendling

You can find out which architecture you are on using go env.

I would suggest to first add more benchmarks. We already have some basic benchmarks but it is sometimes hard to compare the impact of changes with them. The most interesting function as a user (as you might have guessed) is "DiffMain" which uses almost everything of the diff code. @maksimov is working on "commonSuffixLength" (and maybe the prefix version too) which looks promising.

EDIT: I forgot that the profiles in this issue are for your example (which is goroutine heavy) and not for our current benchmarks. So here is another one (which also includes DiffMain):

$ go test -cpuprofile=out.prof -bench=.
BenchmarkDiffCommonPrefix-4              2000000               609 ns/op
BenchmarkDiffCommonSuffix-4              3000000               596 ns/op
BenchmarkDiffHalfMatch-4                    3000            416420 ns/op
BenchmarkDiffCleanupSemantic-4                50          26762635 ns/op
BenchmarkDiffMain-4                            1        1026348830 ns/op
BenchmarkDiffMainLarge-4                       5         240501544 ns/op
BenchmarkDiffMainRunesLargeLines-4          1000           1117465 ns/op
PASS
ok      github.com/sergi/go-diff/diffmatchpatch 11.980s
$ go tool pprof diffmatchpatch.test out.prof 
Entering interactive mode (type "help" for commands)
(pprof) top
10340ms of 12720ms total (81.29%)
Dropped 112 nodes (cum <= 63.60ms)
Showing top 10 nodes out of 83 (cum >= 410ms)
      flat  flat%   sum%        cum   cum%
    3530ms 27.75% 27.75%     4000ms 31.45%  github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).diffBisect
    2710ms 21.31% 49.06%     5070ms 39.86%  runtime.stringtoslicerune
    2270ms 17.85% 66.90%     2270ms 17.85%  runtime.charntorune
     360ms  2.83% 69.73%      440ms  3.46%  runtime.scanobject
     350ms  2.75% 72.48%      350ms  2.75%  runtime.memmove
     290ms  2.28% 74.76%      290ms  2.28%  runtime.heapBitsSetType
     210ms  1.65% 76.42%      210ms  1.65%  github.com/sergi/go-diff/diffmatchpatch.commonSuffixLength
     210ms  1.65% 78.07%      210ms  1.65%  github.com/sergi/go-diff/diffmatchpatch.runesEqual
     210ms  1.65% 79.72%      210ms  1.65%  runtime.memclr
     200ms  1.57% 81.29%      410ms  3.22%  github.com/sergi/go-diff/diffmatchpatch.runesIndex
(pprof) 

This profile suggests "diffBisect" and the string<->rune conversion.

zimmski avatar Apr 10 '17 08:04 zimmski

My "go env" is this, but we run the build in parallel on many different machines and the version of go that uses might not be the same as on my desktop.

go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/usr/local/google/home/morbo/go"
GORACE=""
GOROOT="/usr/lib/google-golang"
GOTOOLDIR="/usr/lib/google-golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build628312538=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

bwendling avatar Apr 12 '17 06:04 bwendling