ruby-lsp icon indicating copy to clipboard operation
ruby-lsp copied to clipboard

Explore making indexing locations lazily fetched with comments

Open vinistock opened this issue 1 year ago • 4 comments

Currently, we eagerly capture all entry locations in the index. We may be able to improve both performance and memory usage if we delay that to only when entries are interacted with.

Entries already know which file they were discovered in and we already fetch their comments lazily, so we could very well build the indexer locations lazily too. Something like

class Entry
  def comments
    return @comments if @comments

    fetch_lazy_data!
    @comments
  end

  def location
    return @location if @location

    fetch_lazy_data!
    @location
  end

  private

  def fetch_lazy_data!
    parse_result = Prism.parse_file(@file_path)

    # Figure out the location for this entry
    # Figure out comments
    # Other things we fetch lazily
  end
end

vinistock avatar Oct 10 '24 18:10 vinistock

Here's some code that would do what you're effectively looking for. I think you should definitely go in this direction, as indexing speed is paramount to getting to interactive.

class IndexSource
  attr_reader :filepath, :encoding, :locations

  def initialize(filepath, encoding)
    @filepath = filepath
    @encoding = encoding
    @locations = []
  end

  def location(location)
    result = IndexLocation.new(self, location.start_offset, location.length)
    locations << result
    result
  end

  def reify
    result = Prism.parse_file(filepath)
    cache = result.code_units_cache(encoding)

    locations.each do |location|
      location.reify(result.source, cache)
    end
  end
end

class IndexLocation
  attr_reader :start_offset, :length
  attr_reader :start_line, :end_line, :start_code_units_column, :end_code_units_column

  def initialize(source, start_offset, length)
    @_source = source
    @start_offset = start_offset
    @length = length
  end

  def end_offset = start_offset + length
  def start_line = (@start_line || (@_source.reify; @start_line))
  def end_line = (@end_line || (@_source.reify; @end_line))
  def start_code_units_column = (@start_code_units_column || (@_source.reify; @start_code_units_column))
  def end_code_units_column = (@end_code_units_column || (@_source.reify; @end_code_units_column))

  def reify(source, cache)
    @_source = nil
    @start_line = source.line(start_offset)
    @end_line = source.line(end_offset)
    @start_code_units_column = cache[start_offset] - cache[source.line_start(start_offset)]
    @end_code_units_column = cache[end_offset] - cache[source.line_start(end_offset)]
  end
end

What this does is create location objects off of IndexLocation objects that hold IndexSource objects. The Prism::IndexSource objects in turn have a list of locations that have been created from them. Then, the first time any of those locations actually needs to calculate an offset, all of the locations that point to the same source reify themselves at the same time (you could instead reparse for each entry if you wanted as well). You would use it like:

def index(filepath)
  result = Prism.parse_file(filepath)
  source = IndexSource.new(filepath, Encoding::UTF_16LE)

  entries = []
  queue = [result.value]

  while (node = queue.shift)
    case node
    when Prism::ClassNode
      entries << [:class, node.name, source.location(node.location)]
    when Prism::ModuleNode
      entries << [:module, node.name, source.location(node.location)]
    end

    queue.concat(node.compact_child_nodes)
  end

  entries
end

Obviously this is just for demonstration purposes. Effectively what this does is parse the file, walk the tree, and extract out class and module nodes, with their indexes converted into IndexLocation objects. Note that we do not need to hold onto the IndexSource object, since it will be cleaned up whenever the first location needs to calculate something.

kddnewton avatar Oct 10 '24 19:10 kddnewton

Another possibility is to use the node_id functionality, which is how error highlight accomplishes this exact problem (it doesn't keep around the source code, it reparses when an error is raised). Every node has a node_id method that corresponds to a unique identifier that will be consistent across parses. If you do that, you can effectively retrieve the node that spawned the location in the first place, so you wouldn't even need to store the start offset and length, you would only need to store the individual integer.

kddnewton avatar Oct 10 '24 19:10 kddnewton

We may be able to achieve this using Prism's new relocation API https://github.com/ruby/prism/pull/3183.

vinistock avatar Oct 16 '24 17:10 vinistock

https://github.com/ruby/prism/blob/main/sample/prism/multiplex_constants.rb

kddnewton avatar Oct 16 '24 18:10 kddnewton

This issue is being marked as stale because there was no activity in the last 2 months

github-actions[bot] avatar Dec 16 '24 12:12 github-actions[bot]