multilspy icon indicating copy to clipboard operation
multilspy copied to clipboard

Dart - request_document_symbols does not return complete range of symbol definitions

Open imanewman opened this issue 10 months ago • 6 comments

Today I've been working with the new Dart parser, I really appreciate the community getting this added in just as I encountered a project that required it.

However, it seems like the language server only returns the selection range of each symbol, not the full range of each symbol's definition. Whereas with any other parser I've used (Java, Python, TS/JS), I can get the selection range of the symbol (ie: the name of a function) as well as the range of the complete symbol definition (ie: a function body).

For my own case I have added a very crude workaround where I try to construct the missing range based on the given range of the symbol's definition, however it isn't perfect and definitely isnt ideal when every other language server in multilspy supports this functionality out of the box.

I tried to dig around the code for the Dart implementation, and the Dart parser itself, but haven't been able to identify the source of this discrepancy in functionality, or how to reliably fix it.

imanewman avatar Mar 04 '25 23:03 imanewman

I suppose you may have already done so, but if not, can you have a look at #72 to see if this is possibly a configurable option, that we did not add in v1 of dart support to multilspy?

LakshyAAAgrawal avatar Mar 05 '25 00:03 LakshyAAAgrawal

Yeah I took a look at that issue before posting this one, based on the descriptions I dont think that any of those options are related, but I am not certain.

imanewman avatar Mar 05 '25 01:03 imanewman

Understood. I think this may be an issue with upstream? As in, the Dart LSP may not actually support that feature. Would you be able to create an issue in the dart repo, linking to this issue?

Also gentle ping to @v4rgas who kindly implemented the dart lsp support in multilspy, and may have some thoughts about this issue.

LakshyAAAgrawal avatar Mar 06 '25 19:03 LakshyAAAgrawal

yeah I was thinking that must be the issue, which does complicate the solution.

I have a very contrived hotfix where I essentially patch together the content from the given range, but I am sure that I did not account for all edge cases with my very crude approach to parsing the file. Here is what I currently do to create my pydantic model for symbols, though I feel like building this in to this library would not be ideal:

class SymbolModel(BaseSchema):

    # ...

    @staticmethod
    def from_raw_symbol(
            raw_symbol: dict,
            path: str,
            file_content: str
    ) -> "SymbolModel":
        """
        Creates a symbol from raw symbol data returned from the language parser.

        :param raw_symbol: The raw symbol data.
        :param path: The path to the file where this symbol is found.
        :param file_content: The content of the file where this symbol is found.

        :return: A created symbol object.
        """
        full_range = raw_symbol["range"] if "range" in raw_symbol else raw_symbol["location"]["range"]
        selection_range = raw_symbol["selectionRange"] if "selectionRange" in raw_symbol else full_range

        range_start = full_range["start"]["line"]
        range_end = full_range["end"]["line"] + 1
        file_lines = file_content.split("\n")
        line_count = len(file_lines)
        symbol_content = "\n".join(file_lines[range_start:range_end])

        if "range" not in raw_symbol:
            # If the parser did not return the full symbol content, find the full content.
            # Target parser: This fixes an issue with the Dart parser.

            # Include mixin definitions in the symbol content.
            if range_end < line_count and file_lines[range_end].strip().startswith("with"):
                range_end = full_range["end"]["line"] = range_end + 1
                symbol_content = "\n".join(file_lines[range_start:range_end])

            # Include the full function definition in the symbol content.
            if symbol_content.strip().endswith("("):
                nested_blocks = 1

                for new_end in range(range_end, line_count):
                    if file_lines[new_end].strip().startswith("//"):
                        continue

                    nested_blocks += file_lines[new_end].count("(") - file_lines[new_end].count(")")

                    if ")" in file_lines[new_end].strip() and nested_blocks == 0:
                        range_end = full_range["end"]["line"] = new_end + 1
                        symbol_content = "\n".join(file_lines[range_start:range_end])
                        break

            # Include the full body content definition in the symbol content.
            if symbol_content.strip().endswith("{"):
                nested_blocks = 1

                for new_end in range(range_end, line_count):
                    if file_lines[new_end].strip().startswith("//"):
                        continue

                    nested_blocks += file_lines[new_end].count("{") - file_lines[new_end].count("}")

                    if "}" in file_lines[new_end].strip() and nested_blocks == 0:
                        range_end = full_range["end"]["line"] = new_end + 1
                        symbol_content = "\n".join(file_lines[range_start:range_end])
                        break

            # Include arrow function definitions in the symbol content.
            if symbol_content.strip().endswith("=>"):
                for new_end in range(range_end, line_count):
                    if file_lines[new_end].strip().endswith(";"):
                        range_end = full_range["end"]["line"] = new_end + 1
                        symbol_content = "\n".join(file_lines[range_start:range_end])
                        break

        name = raw_symbol["name"]

        if "containerName" in raw_symbol:
            name = f"{raw_symbol['containerName']}.{name}"

        return SymbolModel(
            name=name,
            type=SymbolType.from_number(raw_symbol["kind"]),
            path=path,
            range=full_range,
            selection_range=selection_range,
            content=symbol_content
        )

imanewman avatar Mar 06 '25 20:03 imanewman

Is the pydantic model matching (closely if not exactly?) any of the MultilspyTypes? If yes, I think it would be a good contribution (with a few unit tests to track use) in https://github.com/microsoft/multilspy/blob/main/src/multilspy/multilspy_utils.py? What do you think?

LakshyAAAgrawal avatar Mar 06 '25 20:03 LakshyAAAgrawal

Yeah its essentially a wrapper around the symbol data returned from mutlilspy, with a few additions:

  • I store the content of each symbol rather than just the range.
  • I store references from each symbol to others, that I compile with request_references()
  • I have some helper methods for determining when one symbol is composed of other symbols, making them a parent such as a Class and its methods.

I can share my code here for now, when I have some room on my plate I could try making a contribution. My code is a bit more tailored to what I am particularly creating with multilspy, if you have any feedback on what parts are a fitting contribution and what parts are too domain specific, let me know.

Data Models

from typing import Union
from pydantic import ConfigDict, BaseModel, Field
from uuid import UUID, uuid4
from enum import IntEnum


class IdModel(BaseModel):
    """
    A model for all objects with a unique ID.
    """

    # Assign a unique ID for all objects.
    id: UUID = Field(
        default_factory=uuid4,
        title="Identifier",
        description="The object's unique ID."
    )


class FilePointerModel(IdModel):
    """
    Represents a pointer to a specific character in a file.
    """
    line: int = Field(
        default=0,
        title="Line",
        description="A line # within a file."
    )
    character: int = Field(
        default=0,
        title="Starting Character",
        description="A character # within a line."
    )


class FileRangeModel(IdModel):
    """
    Represents a range within a file between two pointers.
    """
    start: FilePointerModel = Field(
        default_factory=lambda: FilePointerModel(),
        title="Start",
        description="The start of the range."
    )
    end: FilePointerModel = Field(
        default_factory=lambda: FilePointerModel(),
        title="Start",
        description="The end of the range."
    )

    def __eq__(self, other):
        return self.start.line == other.start.line \
               and self.start.character == other.start.character \
               and self.end.line == other.end.line \
               and self.end.character == other.end.character


class SymbolRefModel(IdModel):
    """
    Represents a reference to a symbol within a code file.
    """
    path: str = Field(
        default=None,
        title="File Path",
        description="The path to the file being referenced."
    )
    range: FileRangeModel = Field(
        default=None,
        title="Range",
        description="The range within the file of this reference."
    )

    def __str__(self):
        return f"{self.path} " \
               f"({self.range.start.line}:{self.range.start.character})"

    def __hash__(self):
        return hash(str(self))


class SymbolType(IntEnum):
    """
    Represents the types of code symbols.
    Mirrors: multilspy/lsp_protocol_handler/lsp_types.py#SymbolKind
    """

    File = 1
    Module = 2
    Namespace = 3
    Package = 4
    Class = 5
    Method = 6
    Property = 7
    Field = 8
    Constructor = 9
    Enum = 10
    Interface = 11
    Function = 12
    Variable = 13
    Constant = 14
    String = 15
    Number = 16
    Boolean = 17
    Array = 18
    Object = 19
    Key = 20
    Null = 21
    EnumMember = 22
    Struct = 23
    Event = 24
    Operator = 25
    TypeParameter = 26

    @staticmethod
    def from_number(number: int) -> "SymbolType":
        return list(SymbolType.__members__.values())[number - 1]


class SymbolModel(IdModel):
    """
    Represents a symbol within a code file,
    including its name and file location.
    """
    name: str = Field(
        default=None,
        title="Name",
        description="The name of the symbol."
    )
    type: SymbolType = Field(
        default=None,
        title="Type",
        description="The type of data represented by this symbol."
    )

    path: str = Field(
        default=None,
        title="File Path",
        description="The path to the file where this symbol is found."
    )
    range: FileRangeModel = Field(
        default_factory=lambda: FileRangeModel(),
        title="Range",
        description="The range within the file of this symbol's definition."
    )
    selection_range: FileRangeModel = Field(
        default_factory=lambda: FileRangeModel(),
        title="Selection Range",
        description="The range within the file of this symbol's name."
    )

    content: str = Field(
        default=None,
        title="Symbol Content",
        description="The text content for this symbol."
    )

    references: list[SymbolRefModel] = Field(
        default_factory=list,
        title="References",
        description="A list of locations where this symbol is referenced."
    )
    reference_ids: list[UUID] = Field(
        default_factory=list,
        title="Reference IDs",
        description="The ID of symbols that include a reference to this symbol."
    )
    parent_id: UUID | None = Field(
        default=None,
        title="Parent ID",
        description="The ID of this symbol's parent symbol, if one exists."
    )

    def __str__(self):
        return f"{self.name} ({SymbolType.from_number(self.type).name})\n" \
               f"- File: {self.path} " \
               f"(Range: {self.range.start.line}:{self.range.start.character}" \
               f" -> {self.range.end.line}:{self.range.end.character}) " \
               f"(Selection: {self.selection_range.start.line}:{self.selection_range.start.character}" \
               f" -> {self.selection_range.end.line}:{self.selection_range.end.character})\n" \
               f"- Refs: {', '.join([str(ref) for ref in self.references])}"

    def add_references(self, raw_refs: list[dict]) -> None:
        """
        Creates and appends a symbol reference for each reference returned from the language parser.

        :param raw_refs: The raw reference data for this symbol's usage.
        """
        for raw_ref in raw_refs:
            symbol_ref = SymbolRefModel(
                path=raw_ref["relativePath"],
                range=raw_ref["range"],
            )

            # Ignore references that are the same as the symbol's definition.
            if symbol_ref.range == self.selection_range:
                continue

            self.references.append(symbol_ref)

    def is_parent_of(self, other: Union["SymbolModel", SymbolRefModel]) -> bool:
        """
        Checks if this symbol is a parent of another symbol.

        :param other: The other symbol to check.

        :return: True if this symbol is a parent of the other symbol.
        """
        return self.path == other.path \
            and self.range.start.line <= other.range.start.line \
            and self.range.end.line >= other.range.end.line

    @staticmethod
    def from_file(
            path: str,
            file_content: str
    ) -> "SymbolModel":
        """
        Creates a symbol out of an entire file.

        :param path: The path to the file where this symbol is found.
        :param file_content: The content of the file where this symbol is found.

        :return: A created symbol object.
        """
        range = {
            "start": {"line": 0, "character": 0},
            "end": {"line": len(file_content.split("\n")), "character": 0}
        }

        return SymbolModel(
            name=path,
            type=SymbolType.File,
            path=path,
            range=range,
            selection_range=range,
            content=file_content
        )

    @staticmethod
    def from_raw_symbol(
            raw_symbol: dict,
            path: str,
            file_content: str
    ) -> "SymbolModel":
        """
        Creates a symbol from raw symbol data returned from the language parser.

        :param raw_symbol: The raw symbol data.
        :param path: The path to the file where this symbol is found.
        :param file_content: The content of the file where this symbol is found.

        :return: A created symbol object.
        """
        full_range = raw_symbol["range"] if "range" in raw_symbol else raw_symbol["location"]["range"]
        selection_range = raw_symbol["selectionRange"] if "selectionRange" in raw_symbol else full_range

        range_start = full_range["start"]["line"]
        range_end = full_range["end"]["line"] + 1
        file_lines = file_content.split("\n")
        line_count = len(file_lines)
        symbol_content = "\n".join(file_lines[range_start:range_end])

        if "range" not in raw_symbol:
            # If the parser did not return the full symbol content, find the full content.
            # Target parser: This fixes an issue with the Dart parser.

            # Include mixin definitions in the symbol content.
            if range_end < line_count and file_lines[range_end].strip().startswith("with"):
                range_end = full_range["end"]["line"] = range_end + 1
                symbol_content = "\n".join(file_lines[range_start:range_end])

            # Include the full function definition in the symbol content.
            if symbol_content.strip().endswith("("):
                nested_blocks = 1

                for new_end in range(range_end, line_count):
                    if file_lines[new_end].strip().startswith("//"):
                        continue

                    nested_blocks += file_lines[new_end].count("(") - file_lines[new_end].count(")")

                    if ")" in file_lines[new_end].strip() and nested_blocks == 0:
                        range_end = full_range["end"]["line"] = new_end + 1
                        symbol_content = "\n".join(file_lines[range_start:range_end])
                        break

            # Include the full body content definition in the symbol content.
            if symbol_content.strip().endswith("{"):
                nested_blocks = 1

                for new_end in range(range_end, line_count):
                    if file_lines[new_end].strip().startswith("//"):
                        continue

                    nested_blocks += file_lines[new_end].count("{") - file_lines[new_end].count("}")

                    if "}" in file_lines[new_end].strip() and nested_blocks == 0:
                        range_end = full_range["end"]["line"] = new_end + 1
                        symbol_content = "\n".join(file_lines[range_start:range_end])
                        break

            # Include arrow function definitions in the symbol content.
            if symbol_content.strip().endswith("=>"):
                for new_end in range(range_end, line_count):
                    if file_lines[new_end].strip().endswith(";"):
                        range_end = full_range["end"]["line"] = new_end + 1
                        symbol_content = "\n".join(file_lines[range_start:range_end])
                        break

        name = raw_symbol["name"]

        if "containerName" in raw_symbol:
            name = f"{raw_symbol['containerName']}.{name}"

        return SymbolModel(
            name=name,
            type=SymbolType.from_number(raw_symbol["kind"]),
            path=path,
            range=full_range,
            selection_range=selection_range,
            content=symbol_content
        )


class FileTreeModel(IdModel):
    """
    A file tree captures files within a codebase.
    """
    file_names: list[str] = Field(
        default_factory=list,
        title="File Names",
        description="A list of all code file names. Relative to the codebase root directory."
    )
    file_data: dict[str, str] = Field(
        default_factory=dict,
        title="File Data",
        description="A mapping of file names to their content."
    )


class SymbolTreeModel(FileTreeModel):
    """
    A symbol tree captures the parsed symbols throughout a codebase.
    """
    symbol_map: dict[str, SymbolModel] = Field(
        default_factory=dict,
        title="Symbols",
        description="All parsed symbols within this codebase."
    )

    def add_symbol(self, symbol: SymbolModel):
        """
        Add a symbol to the symbol tree.
        """
        self.symbol_map[str(symbol.id)] = symbol

    def get_symbols(
            self,
            symbol_types: list[SymbolType],
    ) -> list[SymbolModel]:
        """
        Returns symbols of the given types.

        :param symbol_types: The types of symbols to return.

        :return: A list of symbols.
        """
        symbols = []
        symbol_type_values = [symbol_type.value for symbol_type in symbol_types]

        for symbol in self.symbol_map.values():
            if symbol.type in symbol_type_values:
                symbols.append(symbol)

        return symbols

I mashed a few files together to put this together, so there may be some small errors/missing imports

imanewman avatar Mar 06 '25 20:03 imanewman