ruby-lsp Introduce guessed receiver types

Motivation

This PR adds the experiment of guessed receiver types, where we try to guess the type of receivers based on their identifier.

Implementation

The relevant part of the implementation is all in TypeInferrer, everything else is just displaying to users why we picked a certain type.

The idea is to try to guess the types like this:

Take the raw receiver slice
Sanitize that name to be camel case and discard @ symbols
First, try to resolve the name inside the current nesting. If we find something, return that
Otherwise, search for the first type that matches the unqualified name of the identifier

More details in the Markdown documentation.

Validation

I used Spoom's access to the Sorbet LSP to compare the guessed types vs the actual types informed by Sorbet. I also compared 4 approaches:

In the Ruby LSP repo, these are the accuracy results for each approach

First resolve then fallback to unqualified name: 15% of correct types
Unqualified only: 11%
Resolve with nesting only: 9%
Fuzzy search: 2% (in addition to being the worse accuracy, fuzzy search was also unbearably slow)

In Core, the analysis script took way too long to finish, so I sampled a subset of the codebase. The results there were worse than in the Ruby LSP codebase, peaking at about 5% of correct types.

Surely, the level of accuracy will vary a lot between different codebases. That said, I still believe the experiment would be worth the try and would love to hear feedback from users about the usefulness of this.

Script:

# typed: strict
# frozen_string_literal: true

require "spoom"
require "ruby_lsp/internal"

class Visitor < Prism::Visitor
  extend T::Sig

  sig { returns(T.nilable(RubyLsp::Document)) }
  attr_accessor :document

  sig { returns(Integer) }
  attr_reader :total, :correct

  sig { returns(T::Hash[String, T.nilable(String)]) }
  attr_reader :comparison

  sig { params(inferrer: RubyLsp::TypeInferrer, lsp_client: Spoom::LSP::Client).void }
  def initialize(inferrer, lsp_client)
    @inferrer = inferrer
    @lsp_client = lsp_client
    @total = T.let(0, Integer)
    @correct = T.let(0, Integer)
    @document = T.let(nil, T.nilable(RubyLsp::Document))
    super()
  end

  sig { params(node: Prism::CallNode).void }
  def visit_call_node(node)
    receiver_loc = node.receiver&.location
    return super unless receiver_loc

    receiver = node.receiver
    unless receiver.is_a?(Prism::CallNode) || receiver.is_a?(Prism::LocalVariableReadNode) ||
        receiver.is_a?(Prism::InstanceVariableReadNode)
      return super
    end

    hover = @lsp_client.hover(T.must(@document).uri.to_s, receiver_loc.start_line - 1, receiver_loc.start_column)

    if hover
      hovered_type = if /returns\((.*)\)/ =~ hover.contents
        T.must(T.must(hover.contents.match(/returns\((.*)\)/))[1])
      else
        hover.contents
      end

      return super if hovered_type == "T.untyped" || hovered_type == "T::Private::Methods::DeclBuilder"

      loc = T.must(node.message_loc)

      node_context = T.must(@document).locate_node(
        {
          line: loc.start_line - 1,
          character: loc.start_column,
        },
        node_types: [Prism::CallNode],
      )

      type = @inferrer.infer_receiver_type(node_context)

      @total += 1

      if type
        parts = type.split("::")
        parts.reject! { |e| e.include?("<Class:") }
        corrected_type = parts.join("::")

        if hovered_type.include?(corrected_type)
          @correct += 1
        end
      end
    end

    super
  end
end

index = RubyIndexer::Index.new
index.index_all

inferrer = RubyLsp::TypeInferrer.new(index)
workspace_path = Dir.pwd

client = Spoom::LSP::Client.new(
  Spoom::Sorbet::BIN_PATH,
  "--lsp",
  "--enable-all-experimental-lsp-features",
  "--disable-watchman",
)
client.open(workspace_path)

begin
  visitor = Visitor.new(inferrer, client)
  files = Dir.glob("#{workspace_path}/**/*.rb")
  RubyVM::YJIT.enable

  Signal.trap("INT") do
    puts "Total: #{visitor.total}"
    puts "Correct: #{visitor.correct}"
    puts "Accuracy: #{100 * (visitor.correct.to_f / visitor.total)}"
    client.close
    exit
  end

  files.each_with_index do |file, index|
    document = RubyLsp::RubyDocument.new(
      source: File.read(file),
      version: 1,
      uri: URI::Generic.from_path(path: File.expand_path(file)),
    )
    visitor.document = document
    Prism.parse_file(file).value.accept(visitor)

    print("\033[M\033[0KCompleted #{index + 1}/#{files.length}")
  end

  puts "Total: #{visitor.total}"
  puts "Correct: #{visitor.correct}"
  puts "Accuracy: #{100 * (visitor.correct.to_f / visitor.total)}"
ensure
  client.close
end

Automated Tests

Added tests.

Manual Tests

Type any existing class name as a variable. After typing a dot, you should see completion options for that type (e.g.: pathname.).

Jun 19 '24 20:06 vinistock

Another thing we could do, especially for the benefit of tests, is to match on a type name followed by a number, e.g. product_1.

Jun 19 '24 20:06 andyw8

Do you think we're able to package the script into a flag, like ruby-lsp --report-guess-type-accuracy? (return early if spoom is not available) It will help us continuously evaluating this feature in the future, and we can ask some community users who also use Sorbet to give us result too.

Jul 22 '24 17:07 st0012

Talked to Stan and we agreed to ship this and follow up with an executable to estimate the type accuracy of guessed types.

Jul 23 '24 15:07 vinistock