ruby-lsp
ruby-lsp copied to clipboard
Explore making indexing locations lazily fetched with comments
Currently, we eagerly capture all entry locations in the index. We may be able to improve both performance and memory usage if we delay that to only when entries are interacted with.
Entries already know which file they were discovered in and we already fetch their comments lazily, so we could very well build the indexer locations lazily too. Something like
class Entry
def comments
return @comments if @comments
fetch_lazy_data!
@comments
end
def location
return @location if @location
fetch_lazy_data!
@location
end
private
def fetch_lazy_data!
parse_result = Prism.parse_file(@file_path)
# Figure out the location for this entry
# Figure out comments
# Other things we fetch lazily
end
end
Here's some code that would do what you're effectively looking for. I think you should definitely go in this direction, as indexing speed is paramount to getting to interactive.
class IndexSource
attr_reader :filepath, :encoding, :locations
def initialize(filepath, encoding)
@filepath = filepath
@encoding = encoding
@locations = []
end
def location(location)
result = IndexLocation.new(self, location.start_offset, location.length)
locations << result
result
end
def reify
result = Prism.parse_file(filepath)
cache = result.code_units_cache(encoding)
locations.each do |location|
location.reify(result.source, cache)
end
end
end
class IndexLocation
attr_reader :start_offset, :length
attr_reader :start_line, :end_line, :start_code_units_column, :end_code_units_column
def initialize(source, start_offset, length)
@_source = source
@start_offset = start_offset
@length = length
end
def end_offset = start_offset + length
def start_line = (@start_line || (@_source.reify; @start_line))
def end_line = (@end_line || (@_source.reify; @end_line))
def start_code_units_column = (@start_code_units_column || (@_source.reify; @start_code_units_column))
def end_code_units_column = (@end_code_units_column || (@_source.reify; @end_code_units_column))
def reify(source, cache)
@_source = nil
@start_line = source.line(start_offset)
@end_line = source.line(end_offset)
@start_code_units_column = cache[start_offset] - cache[source.line_start(start_offset)]
@end_code_units_column = cache[end_offset] - cache[source.line_start(end_offset)]
end
end
What this does is create location objects off of IndexLocation objects that hold IndexSource objects. The Prism::IndexSource objects in turn have a list of locations that have been created from them. Then, the first time any of those locations actually needs to calculate an offset, all of the locations that point to the same source reify themselves at the same time (you could instead reparse for each entry if you wanted as well). You would use it like:
def index(filepath)
result = Prism.parse_file(filepath)
source = IndexSource.new(filepath, Encoding::UTF_16LE)
entries = []
queue = [result.value]
while (node = queue.shift)
case node
when Prism::ClassNode
entries << [:class, node.name, source.location(node.location)]
when Prism::ModuleNode
entries << [:module, node.name, source.location(node.location)]
end
queue.concat(node.compact_child_nodes)
end
entries
end
Obviously this is just for demonstration purposes. Effectively what this does is parse the file, walk the tree, and extract out class and module nodes, with their indexes converted into IndexLocation objects. Note that we do not need to hold onto the IndexSource object, since it will be cleaned up whenever the first location needs to calculate something.
Another possibility is to use the node_id functionality, which is how error highlight accomplishes this exact problem (it doesn't keep around the source code, it reparses when an error is raised). Every node has a node_id method that corresponds to a unique identifier that will be consistent across parses. If you do that, you can effectively retrieve the node that spawned the location in the first place, so you wouldn't even need to store the start offset and length, you would only need to store the individual integer.
We may be able to achieve this using Prism's new relocation API https://github.com/ruby/prism/pull/3183.
https://github.com/ruby/prism/blob/main/sample/prism/multiplex_constants.rb
This issue is being marked as stale because there was no activity in the last 2 months