Improve memory efficiency of Position class
Problem
https://github.com/castwide/solargraph/issues/1055 (extracted into an issue)
This sparked my interest in exploring why this is the case.
Good news here is that we don't seem to have a memory leak; however, we seem to have a huge memory bloat.
Approach
Avoid allocating additional strings, instead use sliced substrings - Use String#index ~~each_line~~ instead of String#lines
ruby-memory-profiler run --max 10 --pretty -- bundle exec solargraph typecheck --level typed lib/solargraph/api_map.rb
Before
Interesting bits:
Total allocated: 1.37 GB (14775646 objects)
Total retained: 136.90 MB (1527641 objects)
allocated memory by gem
-----------------------------------
955.06 MB solargraph/lib
137.73 MB other
82.55 MB prism-1.4.0
75.43 MB parser-3.3.9.0
42.17 MB set
24.82 MB ripper
17.08 MB yard-0.9.37
11.17 MB pathname
6.95 MB ast-2.4.3
5.70 MB rbs-3.9.4
allocated memory by file
-----------------------------------
308.94 MB solargraph/lib/solargraph/position.rb
207.20 MB solargraph/lib/solargraph/complex_type/unique_type.rb
124.32 MB <internal:marshal>
87.84 MB solargraph/lib/solargraph/complex_type.rb
65.70 MB parser-3.3.9.0/lib/parser/builders/default.rb
58.76 MB solargraph/lib/solargraph/source/cursor.rb
54.47 MB solargraph/lib/solargraph/parser/comment_ripper.rb
42.17 MB ruby/lib/set.rb
37.06 MB solargraph/lib/solargraph/pin_cache.rb
33.58 MB prism-1.4.0/lib/prism/translation/parser/compiler.rb
allocated memory by location
-----------------------------------
143.55 MB solargraph/lib/solargraph/complex_type/unique_type.rb:367
130.47 MB solargraph/lib/solargraph/position.rb:85
124.32 MB <internal:marshal>:34
109.17 MB solargraph/lib/solargraph/position.rb:62
67.87 MB solargraph/lib/solargraph/position.rb:87
49.13 MB solargraph/lib/solargraph/complex_type.rb:151
47.80 MB parser-3.3.9.0/lib/parser/builders/default.rb:1875
47.45 MB solargraph/lib/solargraph/parser/comment_ripper.rb:64
36.93 MB solargraph/lib/solargraph/pin_cache.rb:198
27.74 MB ruby/lib/set.rb:244
allocated memory by class
-----------------------------------
540.24 MB String
320.98 MB Array
266.11 MB Hash
55.74 MB Solargraph::ComplexType::UniqueType
21.22 MB Solargraph::Pin::Method
15.31 MB Prism::Location
13.78 MB Solargraph::Pin::Parameter
11.96 MB Parser::AST::Node
10.43 MB Parser::Source::Range
10.03 MB Solargraph::Position
After
Interesting bits:
Total allocated: 1.19 GB (12861604 objects)
Total retained: 136.93 MB (1528024 objects)
allocated memory by gem
-----------------------------------
773.30 MB solargraph/lib
137.73 MB other
82.71 MB prism-1.4.0
75.62 MB parser-3.3.9.0
42.19 MB set
24.89 MB ripper
17.08 MB yard-0.9.37
11.27 MB pathname
6.99 MB ast-2.4.3
5.73 MB rbs-3.9.4
allocated memory by file
-----------------------------------
207.20 MB solargraph/lib/solargraph/complex_type/unique_type.rb
126.81 MB solargraph/lib/solargraph/position.rb
124.32 MB <internal:marshal>
87.84 MB solargraph/lib/solargraph/complex_type.rb
65.87 MB parser-3.3.9.0/lib/parser/builders/default.rb
58.76 MB solargraph/lib/solargraph/source/cursor.rb
54.59 MB solargraph/lib/solargraph/parser/comment_ripper.rb
42.19 MB ruby/lib/set.rb
37.07 MB solargraph/lib/solargraph/pin_cache.rb
33.62 MB prism-1.4.0/lib/prism/translation/parser/compiler.rb
allocated memory by location
-----------------------------------
143.55 MB solargraph/lib/solargraph/complex_type/unique_type.rb:367
124.32 MB <internal:marshal>:34
68.50 MB solargraph/lib/solargraph/position.rb:91
56.88 MB solargraph/lib/solargraph/position.rb:63
49.14 MB solargraph/lib/solargraph/complex_type.rb:151
47.92 MB parser-3.3.9.0/lib/parser/builders/default.rb:1875
47.57 MB solargraph/lib/solargraph/parser/comment_ripper.rb:64
36.93 MB solargraph/lib/solargraph/pin_cache.rb:198
27.76 MB ruby/lib/set.rb:244
23.19 MB ruby/lib/ripper/sexp.rb:128
allocated memory by class
-----------------------------------
373.45 MB String
306.25 MB Array
266.43 MB Hash
55.74 MB Solargraph::ComplexType::UniqueType
21.23 MB Solargraph::Pin::Method
15.34 MB Prism::Location
13.78 MB Solargraph::Pin::Parameter
12.00 MB Parser::AST::Node
10.45 MB Parser::Source::Range
10.04 MB Solargraph::Position
Microbenchmark results: ruby tmp/position_from_offset_benchmark.rb
# tmp/position_from_offset_benchmark.rb
require 'bundler/inline'
# Ensure the required gems are installed
gemfile do
source 'https://rubygems.org'
gem 'benchmark-memory'
gem 'benchmark-ips'
end
large_string = (0..1_000).map { |i| i.to_s * 100 }.join
def from_offset_index text, offset
cursor = 0
line = 0
character = offset
newline_index = -1
while (newline_index = text.index("\n", newline_index + 1)) && newline_index < offset
line += 1
character = offset - newline_index - 1
end
character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end
def from_offset_each_char text, offset
cursor = 0
line = 0
character = nil
char_index = 0
text.each_char do |char|
if cursor == offset
character = char_index
break
end
if char == "\n"
line += 1
char_index = 0
else
char_index += 1
end
cursor += 1
end
character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end
def from_offset_each_line text, offset
cursor = 0
line = 0
character = nil
text.each_line do |l|
line_length = l.length
if l.end_with?("\n") || l.end_with?("\r\n")
char_length = line_length - 1
else
char_length = line_length
end
if cursor + char_length >= offset
character = offset - cursor
break
end
cursor += line_length
line += 1
end
character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end
def from_offset_old text, offset
cursor = 0
line = 0
character = nil
text.lines.each do |l|
line_length = l.length
char_length = l.chomp.length
if cursor + char_length >= offset
character = offset - cursor
break
end
cursor += line_length
line += 1
end
character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end
offset = 10_000_000
Benchmark.memory do |x|
x.report("string.lines.each") do
from_offset_old(large_string, offset)
end
x.report("string.index") do
from_offset_index(large_string, offset)
end
x.report("string.each_line") do
from_offset_each_line(large_string, offset)
end
x.report("string.each_char") do
from_offset_each_char(large_string, offset)
end
x.compare!
end
Benchmark.ips do |x|
x.report("string.lines.each") do
from_offset_old(large_string, offset)
end
x.report("string.index") do
from_offset_index(large_string, offset)
end
x.report("string.each_line") do
from_offset_each_line(large_string, offset)
end
x.report("string.each_char") do
from_offset_each_char(large_string, offset)
end
x.compare!
end
Calculating -------------------------------------
string.lines.each 289.562k memsize ( 289.442k retained)
4.000 objects ( 1.000 retained)
1.000 strings ( 1.000 retained)
string.index 40.000 memsize ( 0.000 retained)
1.000 objects ( 0.000 retained)
1.000 strings ( 0.000 retained)
string.each_line 120.000 memsize ( 0.000 retained)
3.000 objects ( 0.000 retained)
3.000 strings ( 0.000 retained)
string.each_char 23.152M memsize ( 0.000 retained)
578.800k objects ( 0.000 retained)
11.000 strings ( 0.000 retained)
Comparison:
string.index: 40 allocated
string.each_line: 120 allocated - 3.00x more
string.lines.each: 289562 allocated - 7239.05x more
string.each_char: 23152000 allocated - 578800.00x more
ruby 3.3.1 (2024-04-23 revision c56cd86388) [arm64-darwin22]
Warming up --------------------------------------
string.lines.each 14.497k i/100ms
string.index 14.833k i/100ms
string.each_line 13.165k i/100ms
string.each_char 2.000 i/100ms
Calculating -------------------------------------
string.lines.each 143.855k (± 1.8%) i/s - 724.850k in 5.040368s
string.index 151.539k (± 1.6%) i/s - 771.316k in 5.091280s
string.each_line 141.401k (± 8.1%) i/s - 710.910k in 5.069155s
string.each_char 27.332 (± 3.7%) i/s - 138.000 in 5.058226s
Comparison:
string.index: 151538.8 i/s
string.lines.each: 143855.4 i/s - 1.05x slower
string.each_line: 141401.4 i/s - same-ish: difference falls within error
string.each_char: 27.3 i/s - 5544.34x slower
While we unfortunately did not see a good impact on overall memory consumption when running solargraph stdio, I still thought it was worth committing this as a good start, maybe.
I'm absolutely unfamiliar with this codebase, and unsure what this code does. But I'd suggest looking into various string APIs that allow you counting lines without allocating a string for each line.
e..g you can use String#index(offset:) to find new lines one by one, etc.
index may be a bit slow with UTF-8, but you could potentially use byteindex https://docs.ruby-lang.org/en/master/String.html#method-i-byteindex
But again this is just blind suggestions based on the diff. I have no clue what this code does.
I'm absolutely unfamiliar with this codebase, and unsure what this code does. But I'd suggest looking into various string APIs that allow you counting lines without allocating a string for each line.
e..g you can use
String#index(offset:)to find new lines one by one, etc.
indexmay be a bit slow with UTF-8, but you could potentially usebyteindexdocs.ruby-lang.org/en/master/String.html#method-i-byteindexBut again this is just blind suggestions based on the diff. I have no clue what this code does.
@byroot 🚀🚀 https://github.com/castwide/solargraph/pull/1054/commits/5f698e0616c04c323dd40b215d0fc9d39bbe5991
Total consumption went down to Total allocated: 1.10 GB (11661287 objects) from:
Total allocated: 1.19 GB (12861604 objects)- compared to my initialString#each_lineapproachTotal allocated: 1.37 GB (14775646 objects)- compared to original implementation
And file memory consumption down to the bare minimum 🚀
allocated memory by file
-----------------------------------
1.42 MB solargraph/lib/solargraph/position.rb
(had to filter for this specific file only in order to appear in the report :D ruby-memory-profiler run --max 10 --pretty --allow_files lib/solargraph/position.rb -- bundle exec solargraph typecheck --level typed lib/solargraph/api_map.rb
compared to:
My initial approach
allocated memory by file
-----------------------------------
126.81 MB solargraph/lib/solargraph/position.rb
Original implementation
allocated memory by file
-----------------------------------
308.94 MB solargraph/lib/solargraph/position.rb
Today I learned a lot!
From what I can tell, @castwide, this improvement mainly benefits the typechecker, while the LSP Server isn’t affected as much. Hopefully, we can carry some of these learnings over there soon!
Hi @lekemula! This seems to fail currently when master is merged in :(