rugged icon indicating copy to clipboard operation
rugged copied to clipboard

Memory leak caused by diffs?

Open alyssais opened this issue 7 years ago • 19 comments

I'm using Rugged in an application that uses Rugged to generate thousands of diffs in quick succession, and I'm noticing memory growing steadily as it runs.

I've managed to reproduce the problem in this reduced test case, which just generates a lot of diffs with a big repo (rails/rails). You can run this script and watch the memory consumption just go up and up and up.

require "rugged"

def diff_parents(commit)
  puts commit.oid
  commit.parents.each { |parent| parent.diff(commit) }
  commit.parents.each { |parent| diff_parents(parent) }
rescue SystemStackError # This can hit Ruby's recursion limit. Just move on.
end

path = "#{__dir__}/rails"
if File.directory?(path)
  repo = Rugged::Repository.new(path)
else
  print "Cloning rails…"
  repo = Rugged::Repository.clone_at("https://github.com/rails/rails", path)
  puts " done"
end

diff_parents repo.head.target

I don't have the C knowledge to figure out the cause of the leak, so this could be a libgit2 issue rather than a Rugged-specific one.

alyssais avatar Apr 18 '17 16:04 alyssais

@alyssais hi! Thanks for reporting this. I think what we're hitting here is a cache increase. libgit2 keeps a cache of diffs for each call to diff. The life of that cache is the same length as the Rugged::Repository object, so in the case of your sample program the cache life will be the same as the life of the program. The default maximum cache size seems rather large, and unfortunately Rugged doesn't give us any insight in to the cache usage. I've opened #696 so we can see from Ruby land what the cache size / usage is. libgit2 provides API for tuning (or disabling) that cache, but I did't add API for that.

@kivikakk was able to reproduce the leak with a script based on your script, and I've pasted it below:

require "rugged"

def diff_parents(repo, commit)
  tbd = [commit]
  t = 0

  while tbd.length > 0
    commit = tbd.shift
    t += 1

    puts "#{t} #{tbd.length}" if t % 100 == 0

    commit.parents.each { |parent| parent.diff(commit); tbd << parent }
    tbd = tbd[0..100]
  end
end

repo = Rugged::Repository.new("rails")
diff_parents repo, repo.head.target

I found that if I created a new Repository object every once in a while, I could get the memory usage to plateau:

require "rugged"

def diff_parents(repo, commit)
  tbd = [commit]
  t = 0

  while tbd.length > 0
    commit = tbd.shift
    t += 1

    puts "#{t} #{tbd.length}" if t % 100 == 0

    commit.parents.each { |parent| parent.diff(commit); tbd << parent }
    tbd = tbd[0..100]

    if t % 100 == 0 # Limit repo cache size
      tbd.map!(&:oid)
      repo = Rugged::Repository.new repo.path
      tbd.map! { |id| repo.lookup id }
    end
  end
end

repo = Rugged::Repository.new("rails")
diff_parents repo, repo.head.target

Memory usage should already plateau around this size plus whatever other memory Ruby is using. Do you ever see a plateau? If you rework your program to create new Repository objects does it reduce consumption (I know allocating new repo objects is a huge hack 🙇)?

Thanks in advance!

Note: the reason I think it's a cache issue is we're not seeing any leaked memory reported from leaks or valgrind.

tenderlove avatar Apr 19 '17 18:04 tenderlove

Hi @tenderlove, thanks for investigating for me! I don't think I've ever seen a plateau, although it's possible since the cache size is so big that I just wouldn't have noticed it! I'll run a comparison with the creating new repository objects and report back.

alyssais avatar Apr 19 '17 19:04 alyssais

Something else that's just occurred to me — there's a Repository#close method that looks interesting, now that it appears to be a caching problem:

Frees all the resources used by this repository immediately. The repository can still be used after this call. Resources will be opened as necessary.

It is not required to call this method explicitly. Repositories are closed automatically before garbage collection

I wonder, would that affect this cache? Looks like it might — this is the libgit2 function it calls. I'll run a test where I occasionally call that, too.

alyssais avatar Apr 19 '17 19:04 alyssais

With no changes, my program currently peaks at ~1.4GB memory, of which ~232MB is apparently cache.

I tried calling #close on the repo after every time I generated a diff just to see what impact it made. Memory use never rose above 1.2GB, and, more importantly, would consistently decrease to <300MB — I'm not worried about the high memory use, since the demands of my program are pretty intensive, it's the fact that the minimum would creep up over time that worried me, so I think this was what was happening.

I didn't try creating new repository instances because adapting my program to do that would be significantly more difficult — the whole execution of the program is spent iterating a walker attached to a repository instance, so I'd have to keep track of a lot of walker state and rebuild it every so often or the repository would just be retained. Since calling #close seems to have had the same effect, I didn't think it was worth it, but will have a go at it if you think it would make a difference.

Thanks for your help! I spent quite a while on your blog yesterday while I was trying to track this down! :D

alyssais avatar Apr 19 '17 20:04 alyssais

consistently decrease to <300MB

This is great!!

Since calling #close seems to have had the same effect, I didn't think it was worth it, but will have a go at it if you think it would make a difference.

I don't think so. I didn't know that close would impact the cache, but after reading the libgit2 code, I think you're absolutely right. If calling close on the repo keeps your memory issues under control, then please stick with it! In the mean time, I'll try to add functions to Rugged that allow you to tweak the cache settings so that you don't have to change your code. Though I guess the caveat is that the cache settings are global, so changing them would impact all repository objects in the same process.

Thanks for your help! I spent quite a while on your blog yesterday while I was trying to track this down! :D

You're very welcome! I hope it was of some help, though with the opaqueness of this cache, I'm not so sure. 😄

tenderlove avatar Apr 19 '17 21:04 tenderlove

It might be worth considering renaming (or at least aliasing) #close, too? It's a really non-obvious name, to me at least — especially the fact that the repository object is still perfectly usable after calling it.

alyssais avatar Apr 19 '17 22:04 alyssais

I agree. close really sounds like you won't be able to use the object anymore, like on a file. Looks like the libgit2 api calls it cleanup, but I'm not totally sure what the function does (specifically the set_* calls) so I don't know if there is a better name. Do you have any suggestions?

/cc @carlosmn @arthurschreiber

tenderlove avatar Apr 19 '17 22:04 tenderlove

I was going to suggest cleanup, since that's what libgit2 seems to call it, or clean, maybe.

alyssais avatar Apr 20 '17 11:04 alyssais

Bah, even though I'm calling #clean periodically, I'm still getting a memory buildup over time.

Here's a graph (y-axis is %mem used by the script, x-axis is seconds):

    3 +-+---------+----------+-----------+-----------+----------+---------+-+   
      +           +          +           +           +          ***         +   
      |                                                   ******            |   
  2.5 +-+                                 ****************                +-+   
      |                 *******************                                 |   
      |             *****                                                   |   
    2 +-************                                                      +-+   
      |**                                                                   |   
      |                                                                     |   
  1.5 +-+                                                                 +-+   
      |                                                                     |   
      |                                                                     |   
      |                                                                     |   
    1 +-+                                                                 +-+   
      |                                                                     |   
      |                                                                     |   
  0.5 +-+                                                                 +-+   
      |                                                                     |   
      +           +          +           +           +          +           +   
    0 +-+---------+----------+-----------+-----------+----------+---------+-+   
      0           20         40          60          80        100         120  

Here's how that graph is generated (using feedgnuplot):

while true; do ps -o %mem= <PID>; sleep 1; done | feedgnuplot --lines --stream --terminal 'dumb 80,24' --extracmds 'unset grid' --ymin 0 --xmin 0

And here's the test script I'm using:

require "rugged"

puts "PID: #$$"

path = "#{__dir__}/rails"
if File.directory?(path)
  repo = Rugged::Repository.new(path)
else
  print "Cloning rails…"
  repo = Rugged::Repository.clone_at("https://github.com/rails/rails", path)
  puts " done"
end

puts "Creating walk array"
walk = repo.walk(repo.head.target, Rugged::SORT_TOPO).to_a

walk.each.with_index do |commit, i|
  commit.parents.each do |parent|
    diff = repo.diff(parent, commit, context_lines: 0)
  end

  if i % 1000 == 0
    puts "#{i} commits done. Cleaning."
    repo.close
    GC.start
  end
end

Running this script through memory_profiler tells me no Ruby objects are retained between iterations.

I'll try running even more diffs and seeing if the memory continues to increase or if it plateaus.

alyssais avatar Apr 20 '17 11:04 alyssais

I've also tried modifying it to create new repository objects every so often, and that appears to have no difference compared to calling #close.

alyssais avatar Apr 20 '17 11:04 alyssais

@tenderlove I think the idea behind calling the method #close is to mirror IO#close.

arthurschreiber avatar Apr 20 '17 12:04 arthurschreiber

I've also tried modifying it to create new repository objects every so often, and that appears to have no difference compared to calling #close.

Ugh. I'll try working with this program and see what I can find. Are you seeing any GC stats rising? Specifically GC.stat(:heap_sorted_length) and GC.stat(:heap_live_slots).

@tenderlove I think the idea behind calling the method #close is to mirror IO#close.

@arthurschreiber I guess the difference is that IO#close makes the file unusable after you call it where Repository#close allows you to continue using the repo object.

irb(main):012:0> f = File.open 'Rakefile'
=> #<File:Rakefile>
irb(main):013:0> f.close
=> nil
irb(main):014:0> f.read
IOError: closed stream
	from (irb):14:in `read'
	from (irb):14
	from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:11:in `<main>'
irb(main):015:0>

tenderlove avatar Apr 20 '17 15:04 tenderlove

Are you seeing any GC stats rising? Specifically GC.stat(:heap_sorted_length) and GC.stat(:heap_live_slots).

heap_sorted_length is stable at 300, and heap_live_slots is stable at 78594±1.

Here's the full data, with an entry after every 1000 commits:
count heap_allocated_pages heap_sorted_length heap_allocatable_pages heap_available_slots heap_live_slots heap_free_slots heap_final_slots heap_marked_slots heap_eden_pages heap_tomb_pages total_allocated_pages total_freed_pages total_allocated_objects total_freed_objects malloc_increase_bytes malloc_increase_bytes_limit minor_gc_count major_gc_count remembered_wb_unprotected_objects remembered_wb_unprotected_objects_limit old_objects old_objects_limit oldmalloc_increase_bytes oldmalloc_increase_bytes_limit
14 232 300 68 94563 78593 15970 0 78592 232 0 232 0 159838 81245 448 16777216 11 3 62396 124792 16096 32192 2354880 16777216
15 232 300 68 94563 78595 15968 0 78594 232 0 232 0 170301 91706 448 16777216 11 4 62396 124792 16096 32192 896 16777216
16 232 300 68 94563 78594 15969 0 78593 232 0 232 0 180666 102072 448 16777216 11 5 62397 124794 16171 32342 896 16777216
17 232 300 68 94563 78595 15968 0 78594 232 0 232 0 191360 112765 448 16777216 11 6 62397 124794 16171 32342 896 16777216
18 232 300 68 94563 78594 15969 0 78593 232 0 232 0 201935 123341 448 16777216 11 7 62397 124794 16171 32342 896 16777216
19 232 300 68 94563 78594 15969 0 78593 232 0 232 0 212706 134112 448 16777216 11 8 62397 124794 16171 32342 896 16777216
20 232 300 68 94563 78594 15969 0 78593 232 0 232 0 223218 144624 448 16777216 11 9 62397 124794 16171 32342 896 16777216
21 232 300 68 94563 78594 15969 0 78593 232 0 232 0 233842 155248 448 16777216 11 10 62397 124794 16171 32342 896 16777216
22 232 300 68 94563 78595 15968 0 78594 232 0 232 0 243486 164891 448 16777216 11 11 62397 124794 16171 32342 896 16777216
23 232 300 68 94563 78594 15969 0 78593 232 0 232 0 253963 175369 448 16777216 11 12 62397 124794 16171 32342 896 16777216
24 232 300 68 94563 78594 15969 0 78593 232 0 232 0 264034 185440 448 16777216 11 13 62397 124794 16171 32342 896 16777216
25 232 300 68 94563 78594 15969 0 78593 232 0 232 0 274532 195938 448 16777216 11 14 62397 124794 16171 32342 896 16777216
26 232 300 68 94563 78594 15969 0 78593 232 0 232 0 285226 206632 448 16777216 11 15 62397 124794 16171 32342 896 16777216
27 232 300 68 94563 78595 15968 0 78594 232 0 232 0 295584 216989 448 16777216 11 16 62397 124794 16171 32342 896 16777216
28 232 300 68 94563 78595 15968 0 78594 232 0 232 0 305697 227102 448 16777216 11 17 62397 124794 16171 32342 896 16777216
29 232 300 68 94563 78594 15969 0 78593 232 0 232 0 316153 237559 448 16777216 11 18 62397 124794 16171 32342 896 16777216
30 232 300 68 94563 78594 15969 0 78593 232 0 232 0 326119 247525 448 16777216 11 19 62397 124794 16171 32342 896 16777216
31 232 300 68 94563 78595 15968 0 78594 232 0 232 0 336127 257532 448 16777216 11 20 62397 124794 16171 32342 896 16777216
32 232 300 68 94563 78594 15969 0 78593 232 0 232 0 346653 268059 448 16777216 11 21 62397 124794 16171 32342 896 16777216
33 232 300 68 94563 78594 15969 0 78593 232 0 232 0 356878 278284 448 16777216 11 22 62397 124794 16171 32342 896 16777216
34 232 300 68 94563 78595 15968 0 78594 232 0 232 0 366991 288396 448 16777216 11 23 62397 124794 16171 32342 896 16777216
35 232 300 68 94563 78594 15969 0 78593 232 0 232 0 377132 298538 448 16777216 11 24 62397 124794 16171 32342 896 16777216
36 232 300 68 94563 78594 15969 0 78593 232 0 232 0 387252 308658 448 16777216 11 25 62397 124794 16171 32342 896 16777216
37 232 300 68 94563 78594 15969 0 78593 232 0 232 0 397085 318491 448 16777216 11 26 62397 124794 16171 32342 896 16777216
38 232 300 68 94563 78594 15969 0 78593 232 0 232 0 407408 328814 448 16777216 11 27 62397 124794 16171 32342 896 16777216
39 232 300 68 94563 78594 15969 0 78593 232 0 232 0 417381 338787 448 16777216 11 28 62397 124794 16171 32342 896 16777216
40 232 300 68 94563 78594 15969 0 78593 232 0 232 0 427529 348935 448 16777216 11 29 62397 124794 16171 32342 896 16777216
41 232 300 68 94563 78594 15969 0 78593 232 0 232 0 437390 358796 448 16777216 11 30 62397 124794 16171 32342 896 16777216
42 232 300 68 94563 78595 15968 0 78594 232 0 232 0 447153 368558 448 16777216 11 31 62397 124794 16171 32342 896 16777216
43 232 300 68 94563 78595 15968 0 78594 232 0 232 0 456671 378076 448 16777216 11 32 62397 124794 16171 32342 896 16777216
44 232 300 68 94563 78595 15968 0 78594 232 0 232 0 466406 387811 448 16777216 11 33 62397 124794 16171 32342 896 16777216
45 232 300 68 94563 78595 15968 0 78594 232 0 232 0 476057 397462 448 16777216 11 34 62397 124794 16171 32342 896 16777216
46 232 300 68 94563 78594 15969 0 78593 232 0 232 0 485715 407121 448 16777216 11 35 62397 124794 16171 32342 896 16777216
47 232 300 68 94563 78594 15969 0 78593 232 0 232 0 495548 416954 448 16777216 11 36 62397 124794 16171 32342 896 16777216
48 232 300 68 94563 78595 15968 0 78594 232 0 232 0 505528 426933 448 16777216 11 37 62397 124794 16171 32342 896 16777216
49 232 300 68 94563 78594 15969 0 78593 232 0 232 0 515375 436781 448 16777216 11 38 62397 124794 16171 32342 896 16777216
50 232 300 68 94563 78594 15969 0 78593 232 0 232 0 524977 446383 448 16777216 11 39 62397 124794 16171 32342 896 16777216
51 232 300 68 94563 78594 15969 0 78593 232 0 232 0 534628 456034 448 16777216 11 40 62397 124794 16171 32342 896 16777216
52 232 300 68 94563 78594 15969 0 78593 232 0 232 0 544139 465545 448 16777216 11 41 62397 124794 16171 32342 896 16777216
53 232 300 68 94563 78594 15969 0 78593 232 0 232 0 553769 475175 448 16777216 11 42 62397 124794 16171 32342 896 16777216
54 232 300 68 94563 78595 15968 0 78594 232 0 232 0 563553 484958 448 16777216 11 43 62397 124794 16171 32342 896 16777216
55 232 300 68 94563 78594 15969 0 78593 232 0 232 0 572294 493700 448 16777216 11 44 62397 124794 16171 32342 896 16777216
56 232 300 68 94563 78594 15969 0 78593 232 0 232 0 580615 502021 448 16777216 11 45 62397 124794 16171 32342 896 16777216
57 232 300 68 94563 78594 15969 0 78593 232 0 232 0 588908 510314 448 16777216 11 46 62397 124794 16171 32342 896 16777216
58 232 300 68 94563 78594 15969 0 78593 232 0 232 0 597117 518523 448 16777216 11 47 62397 124794 16171 32342 896 16777216
59 232 300 68 94563 78594 15969 0 78593 232 0 232 0 605522 526928 448 16777216 11 48 62397 124794 16171 32342 896 16777216
60 232 300 68 94563 78594 15969 0 78593 232 0 232 0 613906 535312 448 16777216 11 49 62397 124794 16171 32342 896 16777216
61 232 300 68 94563 78594 15969 0 78593 232 0 232 0 622290 543696 448 16777216 11 50 62397 124794 16171 32342 896 16777216
62 232 300 68 94563 78594 15969 0 78593 232 0 232 0 630660 552066 448 16777216 11 51 62397 124794 16171 32342 896 16777216
63 232 300 68 94563 78594 15969 0 78593 232 0 232 0 639149 560555 448 16777216 11 52 62397 124794 16171 32342 896 16777216
64 232 300 68 94563 78594 15969 0 78593 232 0 232 0 647799 569205 448 16777216 11 53 62397 124794 16171 32342 896 16777216
65 232 300 68 94563 78594 15969 0 78593 232 0 232 0 656092 577498 448 16777216 11 54 62397 124794 16171 32342 896 16777216
66 232 300 68 94563 78594 15969 0 78593 232 0 232 0 664630 586036 448 16777216 11 55 62397 124794 16171 32342 896 16777216
67 232 300 68 94563 78594 15969 0 78593 232 0 232 0 673112 594518 448 16777216 11 56 62397 124794 16171 32342 896 16777216
68 232 300 68 94563 78595 15968 0 78594 232 0 232 0 681727 603132 448 16777216 11 57 62397 124794 16171 32342 896 16777216
69 232 300 68 94563 78594 15969 0 78593 232 0 232 0 690195 611601 448 16777216 11 58 62397 124794 16171 32342 896 16777216
70 232 300 68 94563 78594 15969 0 78593 232 0 232 0 698628 620034 448 16777216 11 59 62397 124794 16171 32342 896 16777216
71 232 300 68 94563 78594 15969 0 78593 232 0 232 0 706662 628068 448 16777216 11 60 62397 124794 16171 32342 896 16777216
72 232 300 68 94563 78594 15969 0 78593 232 0 232 0 714696 636102 448 16777216 11 61 62397 124794 16171 32342 896 16777216
73 232 300 68 94563 78594 15969 0 78593 232 0 232 0 722730 644136 448 16777216 11 62 62397 124794 16171 32342 896 16777216
74 232 300 68 94563 78594 15969 0 78593 232 0 232 0 730764 652170 448 16777216 11 63 62397 124794 16171 32342 896 16777216
75 232 300 68 94563 78594 15969 0 78593 232 0 232 0 738798 660204 448 16777216 11 64 62397 124794 16171 32342 896 16777216
76 232 300 68 94563 78594 15969 0 78593 232 0 232 0 746832 668238 448 16777216 11 65 62397 124794 16171 32342 896 16777216

alyssais avatar Apr 20 '17 16:04 alyssais

A graph showing the same as the previous one, but with a much bigger repo (torvalds/linux — about 10x the number of commits). It's less noticeable because I loaded so many commits into memory up front, but there's still a definite trend upwards. The big drop near the start is when another process on my computer started spinning up a lot and a lot of memory got written to swap, which must have interfered with my graph somehow.

  20 +-+-------+---------+---------+----------+---------+---------+-------+-+   
     +         +         +         +          +         +         +      *  +   
  18 +-+                                                               ***+-+   
     *                                          ***                  *** *  |   
  16 *-+                                    ***** *       ************   *+-+   
     *                        **   **********     *********  *           *  |   
  14 *-+  **      *     ************              ***                    *+-+   
     * ************ *****                         *                      *  |   
  12 ***   ***    * *                                                    *+-+   
  10 **+          ***                                                    *+-+   
     **           **                                                     *  |   
   8 **+          **                                                     *+-+   
     **           **                                                     *  |   
   6 **+          **                                                     *+-+   
     **           **                                                     *  |   
   4 **+          **                                                     *+-+   
     **           **                                                     *  |   
   2 **+           *                                                     *+-+   
     *         +         +         +          +         +         +      *  +   
   0 *-+-------+---------+---------+----------+---------+---------+------*+-+   
     0       20000     40000     60000      80000     100000    120000    140000

alyssais avatar Apr 20 '17 17:04 alyssais

Awesome, thanks for the graph. I just wanted to comment and say I haven't forgotten about this, just been busy with life stuff (RailsConf last week, moving this week). I'll continue poking at this on Thursday. I believe @kivikakk was able to write a repro in C, so we might be able to eliminate Ruby from the equation.

tenderlove avatar May 03 '17 15:05 tenderlove

No worries. Thanks @tenderlove! 🙏

alyssais avatar May 03 '17 18:05 alyssais

@tenderlove unfortunately it was a non-repro in C 😞 I couldn't observe the ballooning memory usage without Ruby involved.

kivikakk avatar May 04 '17 02:05 kivikakk

Hi again,

Sorry it's been so long 😞. I think I've reproduced the problem, but I'm not sure what to believe. I'm going to describe my test methodology here a) so that I don't forget and b) so other people can try.

I'm using trunk Ruby compiled with -g so I have debugging symbols (I don't believe trunk Ruby is required, and neither is -g but the debugging symbols make life easier). I'm running rugged from master, and I recompiled it there so it picks up the -g flags too.

This is the script I'm using:

require "rugged"

puts "PID: #$$"

repo = Rugged::Repository.new(ARGV[0])

OID = 'df5ada78c0b0e962d65e89e7487fd35289dc9157'

GC.start
GC.start
GC.start

puts "Creating walk array"
walk = repo.walk(repo.head.target, Rugged::SORT_TOPO).to_a

walk.each.with_index do |commit, i|
  the_rents = commit.parents
  the_rents.each do |parent|
    diff = repo.diff(parent, commit, context_lines: 0)
  end

  if i % 1000 == 0
    puts "#{i} commits done. Cleaning."
    repo.close
    commit = nil
    GC.start
    $stdin.gets
  end
end

The way I'm running the script is like this:

$ pwd
/Users/aaron/github/rugged
$ MallocScribble=1 MallocStackLoggingNoCompact=1 /Users/aaron/.rbenv/versions/ruby-trunk/bin/ruby --disable-gems -Ilib memgrow.rb ~/git/rails

MallocScribble isn't necessary, but MallocStackLoggingNoCompact is. Setting that environment variable records all calls to malloc regardless of requested size, along with the stack trace of the malloc call. When the process stops for some input, I run this command in a different terminal:

$ malloc_history -allBySize $PID > $SOME_FILE

Where $PID is the pid from the memgrow.rb file, and $SOME_FILE is a text file and I change the file name each time I run malloc_history. malloc_history -allBySize outputs stacks and sizes for all live allocations. So everything that is printed from that command should be live memory. So I'll record all live memory in $SOME_FILE, then hit enter a few times (in this case 2 times), then run the malloc_history command again and record live allocations in a new file. That way I can get an idea of what is live and changing over time.

I wrote a diff script to figure out what changed between two recordings:

def read_live_objects file
  f = File.open file
  while f.readline.chomp != '----'
  end
  f.readline
  list = []
  while line = f.readline
    case line
    when "Binary Images:\n"
      return list.group_by(&:last).transform_values { |v| v.map(&:first) }
    when "\n"
    else
      list << line.chomp.split(':', 2)
    end
  end
ensure
  f.close if f
end

def total_allocations allocs
  allocs.values.flatten.map { |v| v[/\d+ bytes/].to_i }.inject(:+)
end

def diff_allocations from, to
  puts "TOTAL HEAP SIZE (bytes) #{total_allocations(from)} => #{total_allocations(to)}"
  total_allocations from
  puts
  puts "NEW STACKS"
  (to.keys - from.keys).each do |key|
    to[key].each do |count|
      p count => key
    end
  end
  puts
  puts "REMOVED STACKS"
  (from.keys - to.keys).each do |key|
    from[key].each do |count|
      p count => key
    end
  end

  puts
  puts "CHANGED STACKS"
  (from.keys & to.keys).each do |key|
    if from[key].sort != to[key].sort
      bytes_from = from[key].flatten.map { |v| v[/\d+ bytes/].to_i }.inject(:+)
      bytes_to = to[key].flatten.map { |v| v[/\d+ bytes/].to_i }.inject(:+)
      p( (bytes_to - bytes_from) => key )
    end
  end
end

from = read_live_objects ARGV[0]
to = read_live_objects ARGV[1]

diff_allocations from, to

If I summed the live bytes from the allocation logs, I observed the size grow over time. However, if I just inspected the process with ActivityMonitor I could not observe the same thing (which is why I'm not sure who to believe). I took about 4 samples of the process and saw it grow about 5mb each time (according to the allocation history logs).

The stack that increased the most on each iteration is this:

irb(main):008:0> puts x.grep_v(/^(vm_|invoke)/).each_with_index.map { |s,i| (" " * i) + s }.join("\n")
thread_7fffaffa03c0
 start
  main
   ruby_run_node
    ruby_exec_node
     ruby_exec_internal
      rb_iseq_eval_main
       call_cfunc_m1
        enumerator_with_index
         enumerator_block_call
          rb_block_call
           rb_iterate
            rb_iterate0
             iterate_method
              rb_call
               rb_call0
                call_cfunc_0
                 rb_ary_each
                  rb_yield
                   rb_yield_1
                    rb_yield_0
                     enumerator_with_index_i
                      rb_yield_values
                       rb_yield_0
                        call_cfunc_0
                         rb_ary_each
                          rb_yield
                           rb_yield_1
                            rb_yield_0
                             call_cfunc_4
                              rb_git_diff_tree_to_tree
                               rb_thread_call_without_gvl
                                call_without_gvl
                                 rb_git_diff_tree_to_tree_nogvl
                                  git_diff_tree_to_tree
                                   git_diff__from_iterators
                                    iterator_advance
                                     tree_iterator_advance
                                      tree_iterator_frame_push
                                       git_object_lookup_prefix
                                        git_odb_read
                                         odb_read_1
                                          pack_backend__read
                                           git_packfile_unpack
                                            git_delta_apply
                                             malloc

I suspect this malloc to be the one that's leaking. But I can't tell how from the code, and also leaks does not corroborate my findings. AFAICT, the caller is responsible for freeing that memory, but it looks like there is some interesting logic around how (or if) that happens.

tenderlove avatar Jun 14 '17 23:06 tenderlove

I tried running this again, but recorded the output of heap along with the output from malloc_history. heap seems to agree that the process size is growing as well.

tenderlove avatar Jun 14 '17 23:06 tenderlove