solargraph icon indicating copy to clipboard operation
solargraph copied to clipboard

Improve memory efficiency of Position class

Open lekemula opened this issue 3 months ago • 4 comments

Problem

https://github.com/castwide/solargraph/issues/1055 (extracted into an issue)

This sparked my interest in exploring why this is the case.

Good news here is that we don't seem to have a memory leak; however, we seem to have a huge memory bloat.

Approach

Avoid allocating additional strings, instead use sliced substrings - Use String#index ~~each_line~~ instead of String#lines

ruby-memory-profiler run --max 10 --pretty -- bundle exec solargraph typecheck --level typed lib/solargraph/api_map.rb

Before

Interesting bits:

Total allocated: 1.37 GB (14775646 objects)
Total retained:  136.90 MB (1527641 objects)

allocated memory by gem
-----------------------------------
 955.06 MB  solargraph/lib
 137.73 MB  other
  82.55 MB  prism-1.4.0
  75.43 MB  parser-3.3.9.0
  42.17 MB  set
  24.82 MB  ripper
  17.08 MB  yard-0.9.37
  11.17 MB  pathname
   6.95 MB  ast-2.4.3
   5.70 MB  rbs-3.9.4

allocated memory by file
-----------------------------------
 308.94 MB  solargraph/lib/solargraph/position.rb
 207.20 MB  solargraph/lib/solargraph/complex_type/unique_type.rb
 124.32 MB  <internal:marshal>
  87.84 MB  solargraph/lib/solargraph/complex_type.rb
  65.70 MB  parser-3.3.9.0/lib/parser/builders/default.rb
  58.76 MB  solargraph/lib/solargraph/source/cursor.rb
  54.47 MB  solargraph/lib/solargraph/parser/comment_ripper.rb
  42.17 MB  ruby/lib/set.rb
  37.06 MB  solargraph/lib/solargraph/pin_cache.rb
  33.58 MB  prism-1.4.0/lib/prism/translation/parser/compiler.rb

allocated memory by location
-----------------------------------
 143.55 MB  solargraph/lib/solargraph/complex_type/unique_type.rb:367
 130.47 MB  solargraph/lib/solargraph/position.rb:85
 124.32 MB  <internal:marshal>:34
 109.17 MB  solargraph/lib/solargraph/position.rb:62
  67.87 MB  solargraph/lib/solargraph/position.rb:87
  49.13 MB  solargraph/lib/solargraph/complex_type.rb:151
  47.80 MB  parser-3.3.9.0/lib/parser/builders/default.rb:1875
  47.45 MB  solargraph/lib/solargraph/parser/comment_ripper.rb:64
  36.93 MB  solargraph/lib/solargraph/pin_cache.rb:198
  27.74 MB  ruby/lib/set.rb:244

allocated memory by class
-----------------------------------
 540.24 MB  String
 320.98 MB  Array
 266.11 MB  Hash
  55.74 MB  Solargraph::ComplexType::UniqueType
  21.22 MB  Solargraph::Pin::Method
  15.31 MB  Prism::Location
  13.78 MB  Solargraph::Pin::Parameter
  11.96 MB  Parser::AST::Node
  10.43 MB  Parser::Source::Range
  10.03 MB  Solargraph::Position

After

Interesting bits:

Total allocated: 1.19 GB (12861604 objects)
Total retained:  136.93 MB (1528024 objects)

allocated memory by gem
-----------------------------------
 773.30 MB  solargraph/lib
 137.73 MB  other
  82.71 MB  prism-1.4.0
  75.62 MB  parser-3.3.9.0
  42.19 MB  set
  24.89 MB  ripper
  17.08 MB  yard-0.9.37
  11.27 MB  pathname
   6.99 MB  ast-2.4.3
   5.73 MB  rbs-3.9.4

allocated memory by file
-----------------------------------
 207.20 MB  solargraph/lib/solargraph/complex_type/unique_type.rb
 126.81 MB  solargraph/lib/solargraph/position.rb
 124.32 MB  <internal:marshal>
  87.84 MB  solargraph/lib/solargraph/complex_type.rb
  65.87 MB  parser-3.3.9.0/lib/parser/builders/default.rb
  58.76 MB  solargraph/lib/solargraph/source/cursor.rb
  54.59 MB  solargraph/lib/solargraph/parser/comment_ripper.rb
  42.19 MB  ruby/lib/set.rb
  37.07 MB  solargraph/lib/solargraph/pin_cache.rb
  33.62 MB  prism-1.4.0/lib/prism/translation/parser/compiler.rb

allocated memory by location
-----------------------------------
 143.55 MB  solargraph/lib/solargraph/complex_type/unique_type.rb:367
 124.32 MB  <internal:marshal>:34
  68.50 MB  solargraph/lib/solargraph/position.rb:91
  56.88 MB  solargraph/lib/solargraph/position.rb:63
  49.14 MB  solargraph/lib/solargraph/complex_type.rb:151
  47.92 MB  parser-3.3.9.0/lib/parser/builders/default.rb:1875
  47.57 MB  solargraph/lib/solargraph/parser/comment_ripper.rb:64
  36.93 MB  solargraph/lib/solargraph/pin_cache.rb:198
  27.76 MB  ruby/lib/set.rb:244
  23.19 MB  ruby/lib/ripper/sexp.rb:128

allocated memory by class
-----------------------------------
 373.45 MB  String
 306.25 MB  Array
 266.43 MB  Hash
  55.74 MB  Solargraph::ComplexType::UniqueType
  21.23 MB  Solargraph::Pin::Method
  15.34 MB  Prism::Location
  13.78 MB  Solargraph::Pin::Parameter
  12.00 MB  Parser::AST::Node
  10.45 MB  Parser::Source::Range
  10.04 MB  Solargraph::Position

Microbenchmark results: ruby tmp/position_from_offset_benchmark.rb

# tmp/position_from_offset_benchmark.rb

require 'bundler/inline'

# Ensure the required gems are installed
gemfile do
  source 'https://rubygems.org'
  gem 'benchmark-memory'
  gem 'benchmark-ips'
end


large_string = (0..1_000).map { |i| i.to_s * 100 }.join

def from_offset_index text, offset
  cursor = 0
  line = 0
  character = offset
  newline_index = -1

  while (newline_index = text.index("\n", newline_index + 1)) && newline_index < offset
    line += 1
    character = offset - newline_index - 1
  end

  character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end

def from_offset_each_char text, offset
  cursor = 0
  line = 0
  character = nil
  char_index = 0
  text.each_char do |char|
    if cursor == offset
      character = char_index
      break
    end

    if char == "\n"
      line += 1
      char_index = 0
    else
      char_index += 1
    end

    cursor += 1
  end
  character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end

def from_offset_each_line text, offset
  cursor = 0
  line = 0
  character = nil
  text.each_line do |l|
    line_length = l.length

    if l.end_with?("\n") || l.end_with?("\r\n")
      char_length = line_length - 1
    else
      char_length = line_length
    end

    if cursor + char_length >= offset
      character = offset - cursor
      break
    end
    cursor += line_length
    line += 1
  end
  character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end


def from_offset_old text, offset
  cursor = 0
  line = 0
  character = nil
  text.lines.each do |l|
    line_length = l.length
    char_length = l.chomp.length
    if cursor + char_length >= offset
      character = offset - cursor
      break
    end
    cursor += line_length
    line += 1
  end
  character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end

offset = 10_000_000

Benchmark.memory do |x|
  x.report("string.lines.each") do
    from_offset_old(large_string, offset)
  end

  x.report("string.index") do
    from_offset_index(large_string, offset)
  end

  x.report("string.each_line") do
    from_offset_each_line(large_string, offset)
  end

  x.report("string.each_char") do
    from_offset_each_char(large_string, offset)
  end

  x.compare!
end


Benchmark.ips do |x|
  x.report("string.lines.each") do
    from_offset_old(large_string, offset)
  end

  x.report("string.index") do
    from_offset_index(large_string, offset)
  end

  x.report("string.each_line") do
    from_offset_each_line(large_string, offset)
  end

  x.report("string.each_char") do
    from_offset_each_char(large_string, offset)
  end

  x.compare!
end

Calculating -------------------------------------
   string.lines.each   289.562k memsize (   289.442k retained)
                         4.000  objects (     1.000  retained)
                         1.000  strings (     1.000  retained)
        string.index    40.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         1.000  strings (     0.000  retained)
    string.each_line   120.000  memsize (     0.000  retained)
                         3.000  objects (     0.000  retained)
                         3.000  strings (     0.000  retained)
    string.each_char    23.152M memsize (     0.000  retained)
                       578.800k objects (     0.000  retained)
                        11.000  strings (     0.000  retained)

Comparison:
        string.index:         40 allocated
    string.each_line:        120 allocated - 3.00x more
   string.lines.each:     289562 allocated - 7239.05x more
    string.each_char:   23152000 allocated - 578800.00x more
ruby 3.3.1 (2024-04-23 revision c56cd86388) [arm64-darwin22]
Warming up --------------------------------------
   string.lines.each    14.497k i/100ms
        string.index    14.833k i/100ms
    string.each_line    13.165k i/100ms
    string.each_char     2.000 i/100ms
Calculating -------------------------------------
   string.lines.each    143.855k (± 1.8%) i/s -    724.850k in   5.040368s
        string.index    151.539k (± 1.6%) i/s -    771.316k in   5.091280s
    string.each_line    141.401k (± 8.1%) i/s -    710.910k in   5.069155s
    string.each_char     27.332 (± 3.7%) i/s -    138.000 in   5.058226s

Comparison:
        string.index:   151538.8 i/s
   string.lines.each:   143855.4 i/s - 1.05x  slower
    string.each_line:   141401.4 i/s - same-ish: difference falls within error
    string.each_char:       27.3 i/s - 5544.34x  slower

While we unfortunately did not see a good impact on overall memory consumption when running solargraph stdio, I still thought it was worth committing this as a good start, maybe.

lekemula avatar Aug 31 '25 10:08 lekemula

I'm absolutely unfamiliar with this codebase, and unsure what this code does. But I'd suggest looking into various string APIs that allow you counting lines without allocating a string for each line.

e..g you can use String#index(offset:) to find new lines one by one, etc.

index may be a bit slow with UTF-8, but you could potentially use byteindex https://docs.ruby-lang.org/en/master/String.html#method-i-byteindex

But again this is just blind suggestions based on the diff. I have no clue what this code does.

byroot avatar Aug 31 '25 10:08 byroot

I'm absolutely unfamiliar with this codebase, and unsure what this code does. But I'd suggest looking into various string APIs that allow you counting lines without allocating a string for each line.

e..g you can use String#index(offset:) to find new lines one by one, etc.

index may be a bit slow with UTF-8, but you could potentially use byteindex docs.ruby-lang.org/en/master/String.html#method-i-byteindex

But again this is just blind suggestions based on the diff. I have no clue what this code does.

@byroot 🚀🚀 https://github.com/castwide/solargraph/pull/1054/commits/5f698e0616c04c323dd40b215d0fc9d39bbe5991

giphy

Total consumption went down to Total allocated: 1.10 GB (11661287 objects) from:

  • Total allocated: 1.19 GB (12861604 objects) - compared to my initial String#each_line approach
  • Total allocated: 1.37 GB (14775646 objects) - compared to original implementation

And file memory consumption down to the bare minimum 🚀

allocated memory by file
-----------------------------------
   1.42 MB  solargraph/lib/solargraph/position.rb

(had to filter for this specific file only in order to appear in the report :D ruby-memory-profiler run --max 10 --pretty --allow_files lib/solargraph/position.rb -- bundle exec solargraph typecheck --level typed lib/solargraph/api_map.rb

compared to:

My initial approach

allocated memory by file
-----------------------------------
 126.81 MB  solargraph/lib/solargraph/position.rb

Original implementation

allocated memory by file
-----------------------------------
 308.94 MB  solargraph/lib/solargraph/position.rb

Today I learned a lot!

lekemula avatar Aug 31 '25 21:08 lekemula

From what I can tell, @castwide, this improvement mainly benefits the typechecker, while the LSP Server isn’t affected as much. Hopefully, we can carry some of these learnings over there soon!

lekemula avatar Aug 31 '25 21:08 lekemula

Hi @lekemula! This seems to fail currently when master is merged in :(

apiology avatar Oct 05 '25 23:10 apiology