GoReSym profile runtime and optimize performance

GoReSym is pretty slow and this makes it difficult to deploy at a large scale. Despite being written in Go and compiled to native code, it may take seconds or longer to process input files. For example, to run the entire test suite, it takes many minutes.

Given that Go programs can execute in a split second and their runtime uses all the metadata that GoReSym parses, it should be possible for GoReSym to run in a split second too, at least in the common case. (I understand that in obfuscated binaries that some scans may take a bit longer).

We should profile GoReSym to identify the slow parts of the algorithm and optimize their implementation. Our goal should be for GoReSym to finish within 0.10s for the common case.

Aug 03 '23 21:08 williballenthin

to profile using the tests:

go get github.com/pkg/profile

and then add the following to TestAllVersions:

defer profile.Start(profile.ProfilePath(".")).Stop()

via: https://hackernoon.com/go-the-complete-guide-to-profiling-your-code-h51r3waz

Aug 03 '23 21:08 williballenthin

here's cpu.pprof from running tests on #30 cpu.zip

@williballenthin ➜ /workspaces/GoReSym (fix/issue-29) $ pprof -cum -top cpu.pprof 
File: GoReSym.test
Type: cpu
Time: Aug 3, 2023 at 9:29pm (UTC)
Duration: 352.86s, Total samples = 354.07s (100.34%)
Showing nodes accounting for 331.39s, 93.59% of 354.07s total
Dropped 581 nodes (cum <= 1.77s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%    335.93s 94.88%  testing.tRunner
         0     0%     0%    335.91s 94.87%  github.com/mandiant/GoReSym.TestAllVersions.func1
         0     0%     0%    335.91s 94.87%  github.com/mandiant/GoReSym.main_impl
         0     0%     0%    295.12s 83.35%  github.com/mandiant/GoReSym/objfile.(*Entry).PCLineTable
         0     0%     0%    295.12s 83.35%  github.com/mandiant/GoReSym/objfile.(*File).PCLineTable (inline)
         0     0%     0%    289.02s 81.63%  github.com/mandiant/GoReSym/objfile.findModuleInitPCHeader
         0     0%     0%    288.37s 81.44%  rsc.io/binaryregexp.(*Regexp).FindAllIndex
         0     0%     0%    288.37s 81.44%  rsc.io/binaryregexp.(*Regexp).allMatches
         0     0%     0%    288.37s 81.44%  rsc.io/binaryregexp.(*Regexp).doExecute
    33.40s  9.43%  9.43%    288.28s 81.42%  rsc.io/binaryregexp.(*machine).match
    79.41s 22.43% 31.86%    183.96s 51.96%  rsc.io/binaryregexp.(*machine).step
         0     0% 31.86%    169.58s 47.89%  github.com/mandiant/GoReSym/objfile.(*elfFile).pcln
         0     0% 31.86%    169.57s 47.89%  github.com/mandiant/GoReSym/objfile.(*elfFile).pcln_scan
   131.32s 37.09% 68.95%    150.59s 42.53%  rsc.io/binaryregexp.(*machine).add
         0     0% 68.95%     95.43s 26.95%  github.com/mandiant/GoReSym/objfile.(*machoFile).pcln
         0     0% 68.95%     95.43s 26.95%  github.com/mandiant/GoReSym/objfile.(*machoFile).pcln_scan
         0     0% 68.95%     29.15s  8.23%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln
         0     0% 68.95%     29.14s  8.23%  github.com/mandiant/GoReSym/objfile.(*peFile).pcln_scan
         0     0% 68.95%     28.55s  8.06%  github.com/mandiant/GoReSym/objfile.(*Entry).ModuleDataTable
         0     0% 68.95%     28.55s  8.06%  github.com/mandiant/GoReSym/objfile.(*File).ModuleDataTable
    27.90s  7.88% 76.83%     27.90s  7.88%  runtime.memmove
         0     0% 76.83%     19.57s  5.53%  github.com/mandiant/GoReSym/debug/elf.(*File).DataAfterSection
     0.04s 0.011% 76.84%     19.01s  5.37%  runtime.systemstack
         0     0% 76.84%     18.39s  5.19%  github.com/mandiant/GoReSym/objfile.(*elfFile).moduledata_scan
     0.06s 0.017% 76.86%     17.52s  4.95%  runtime.growslice
     1.96s  0.55% 77.41%     14.10s  3.98%  rsc.io/binaryregexp/syntax.(*Inst).MatchRune (inline)
     0.75s  0.21% 77.62%     13.27s  3.75%  runtime.mallocgc
    12.13s  3.43% 81.05%     12.14s  3.43%  rsc.io/binaryregexp/syntax.(*Inst).MatchRunePos
     0.01s 0.0028% 81.05%     11.58s  3.27%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseTypeLinks
         0     0% 81.05%     11.58s  3.27%  github.com/mandiant/GoReSym/objfile.(*File).ParseTypeLinks (inline)
     0.02s 0.0056% 81.06%     11.33s  3.20%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType
     0.01s 0.0028% 81.06%     11.21s  3.17%  runtime.gcBgMarkWorker
     0.14s  0.04% 81.10%     11.18s  3.16%  github.com/mandiant/GoReSym/objfile.(*Entry).ParseType_impl
     0.02s 0.0056% 81.11%     10.63s  3.00%  os.(*File).ReadAt
     0.03s 0.0085% 81.11%     10.61s  3.00%  os.(*File).pread (inline)
     0.04s 0.011% 81.13%     10.58s  2.99%  internal/poll.(*FD).Pread
     0.07s  0.02% 81.14%     10.58s  2.99%  io.(*SectionReader).ReadAt
     0.04s 0.011% 81.16%     10.35s  2.92%  syscall.Syscall6
         0     0% 81.16%     10.34s  2.92%  syscall.Pread (inline)
         0     0% 81.16%     10.34s  2.92%  syscall.pread
     9.87s  2.79% 83.94%      9.87s  2.79%  runtime/internal/syscall.Syscall6
     0.01s 0.0028% 83.95%      9.81s  2.77%  syscall.RawSyscall6
     0.01s 0.0028% 83.95%      9.69s  2.74%  runtime.gcBgMarkWorker.func2
     0.01s 0.0028% 83.95%      9.68s  2.73%  github.com/mandiant/GoReSym/debug/macho.(*File).DataAfterSection
     0.05s 0.014% 83.97%      9.68s  2.73%  runtime.gcDrain
     0.01s 0.0028% 83.97%      9.44s  2.67%  github.com/mandiant/GoReSym/objfile.(*machoFile).moduledata_scan
     3.57s  1.01% 84.98%      8.42s  2.38%  runtime.scanobject
     0.04s 0.011% 84.99%      7.46s  2.11%  runtime.makeslice
     7.06s  1.99% 86.98%      7.07s  2.00%  rsc.io/binaryregexp.(*inputBytes).step
     6.81s  1.92% 88.91%      6.81s  1.92%  runtime.memclrNoHeapPointers
     6.67s  1.88% 90.79%      6.72s  1.90%  rsc.io/binaryregexp.(*machine).alloc (inline)
     0.02s 0.0056% 90.80%      5.74s  1.62%  github.com/mandiant/GoReSym/debug/elf.(*Section).Data
     0.08s 0.023% 90.82%      4.72s  1.33%  github.com/mandiant/GoReSym/objfile.(*Entry).readRTypeName
         0     0% 90.82%      4.70s  1.33%  runtime.markroot
     0.06s 0.017% 90.84%      4.21s  1.19%  runtime.deductAssistCredit
         0     0% 90.84%      4.15s  1.17%  runtime.gcAssistAlloc
         0     0% 90.84%      3.79s  1.07%  runtime.gcAssistAlloc.func1
         0     0% 90.84%      3.79s  1.07%  runtime.gcAssistAlloc1
     0.08s 0.023% 90.86%      3.75s  1.06%  runtime.gcDrainN
         0     0% 90.86%      3.60s  1.02%  runtime.memclrNoHeapPointersChunked
     0.09s 0.025% 90.88%      3.31s  0.93%  github.com/mandiant/GoReSym/objfile.(*peFile).read_memory
     0.01s 0.0028% 90.89%      3.19s   0.9%  github.com/mandiant/GoReSym/debug/macho.(*Section).Data
     0.12s 0.034% 90.92%      3.01s  0.85%  runtime.bgsweep
         0     0% 90.92%         3s  0.85%  runtime.bgscavenge
     0.03s 0.0085% 90.93%      2.99s  0.84%  runtime.(*scavengerState).run
         0     0% 90.93%      2.96s  0.84%  runtime.(*scavengerState).init.func2
     2.95s  0.83% 91.76%      2.95s  0.83%  rsc.io/binaryregexp.newLazyFlag (inline)
     1.84s  0.52% 92.28%      2.95s  0.83%  runtime.findObject
         0     0% 92.28%      2.94s  0.83%  runtime.(*pageAlloc).scavenge
     0.01s 0.0028% 92.28%      2.93s  0.83%  runtime.(*pageAlloc).scavenge.func1
         0     0% 92.28%      2.92s  0.82%  runtime.(*pageAlloc).scavengeOne
         0     0% 92.28%      2.86s  0.81%  runtime.markroot.func1
     0.01s 0.0028% 92.29%      2.80s  0.79%  runtime.sysUnused
     0.02s 0.0056% 92.29%      2.79s  0.79%  runtime.sysUnusedOS
     2.77s  0.78% 93.07%      2.77s  0.78%  runtime.madvise
     0.04s 0.011% 93.09%      2.75s  0.78%  github.com/mandiant/GoReSym/objfile.(*elfFile).read_memory
     0.18s 0.051% 93.14%      2.33s  0.66%  runtime.suspendG
     0.04s 0.011% 93.15%      2.26s  0.64%  io.ReadAtLeast
         0     0% 93.15%      2.26s  0.64%  io.ReadFull (inline)
         0     0% 93.15%      2.18s  0.62%  io.(*SectionReader).Read
     1.33s  0.38% 93.52%      1.92s  0.54%  runtime.scanblock
     0.24s 0.068% 93.59%      1.88s  0.53%  bytes.Index
     0.01s 0.0028% 93.59%      1.85s  0.52%  runtime.gcMarkDone
         0     0% 93.59%      1.79s  0.51%  runtime.markrootBlock

Aug 03 '23 21:08 williballenthin

Leaving this open for future optimization work. The kubectl binary is a great test case, it's massive and thoroughly exercises most code paths when run with -t -p -d flags.

Aug 15 '23 19:08 stevemk14ebr

Easier way to profile use https://www.speedscope.app/ and generate profiles by just importing _ "net/http/pprof" at the top of a package and run with:

go test -run TestBig -cpuprofile cpu.txt

Runtime is still dominated by scanning for moduledata but also the time it takes to parse types via ParseType_impl. Can probably minimize the moduledata scans a bit with some algorithm adjustments to the recovery. We're probably close to scanning through memory as fast as possible. I was able to see it's using bytealg.Index which is an AVX2 hand rolled assembly routine - we hit the fast path as long as our needle is < 64 bytes which we always will be.

Aug 16 '23 19:08 stevemk14ebr

Unreasonable memory consumption leading to OOM in sample https://github.com/EdgeGuardP/EdgeGuard-Stealer/releases/tag/EdgeStealer4.0

Aug 29 '23 16:08 stevemk14ebr

the regex convert from yara is too slow, maybe we can convert it to rangel patterns

https://www.colm.net/open-source/ragel/

Sep 15 '23 09:09 virusdefender

@virusdefender is it specifically the conversion routine that is slow or more generally the memory scanning? Can you show some benchmarks that demonstrate the performance issue perhaps?

Sep 16 '23 14:09 stevemk14ebr

@virusdefender is it specifically the conversion routine that is slow or more generally the memory scanning? Can you show some benchmarks that demonstrate the performance issue perhaps?

using the pprof result in https://github.com/mandiant/GoReSym/issues/31#issuecomment-1664681057

it shows the regex match is slow

Sep 17 '23 01:09 virusdefender

@virusdefender that pprof result is out of date, if you can profile the application on the current master you may find it's faster. If not, please provide some data suggesting where else we may be missing optimizations :)

Oct 06 '23 18:10 stevemk14ebr

GoReSym GoReSym copied to clipboard

profile runtime and optimize performance

GoReSym
GoReSym copied to clipboard