griffe icon indicating copy to clipboard operation
griffe copied to clipboard

feature: Docstring method for computing source line and column given offset into value string

Open analog-cbarber opened this issue 9 months ago • 4 comments

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

I would like to be able to report correct line and column for errors in cross-references embedded in a doc-string as part of the mkdocstrings-python-xref plugin. Ideally I would like a method like:

    def value_offset_to_line_col(self, offset: int) -> tuple[int,int]:
        """
        Convert offset into doc's value string into corresponding line and column from source file.

        Returns:
            line and column or else -1 if there is not source information to compute

Additional context

I am just going to try to implement such a function in my project and if there is interest, will submit a PR here.

analog-cbarber avatar Mar 22 '25 22:03 analog-cbarber

Thanks for the feature request @analog-cbarber 🙂

About the offset parameter: it's an offset in the cleaned, dedented docstring, right? And you want the function to convert it to the offset in the source docstring. What would you think of a property that simply returns the source docstring base offset instead? This way you can simply add the offset delta to your own offset, and we don't need a method with arguments.

source_offset = your_offset + docstring.source_offset

Something like this 🤔 But maybe I'm misunderstanding a few things here.

Also I'd be happy to provide this kind of information natively, for example by storing column offsets everywhere it makes sense, and by returning them from the docstring parsers too, when errors are found. (But maybe you're doing your own parsing here so it wouldn't help.)

pawamoy avatar Mar 23 '25 10:03 pawamoy

As I said, given a cleaned up docstring, I need to be able to convert any offset into that string into correct line and column.

This is non-trivial because the docstring cleanup involves removing leading newlines and reindenting and the way that reindention happens depends on whether the first line of the docstring is on the same line as the opening quotes.

Here is what I came up with for use in mkdocstrings-python-xref:

def doc_value_offset_to_location(doc: Docstring, offset: int) -> tuple[int,int]:
    """
    Converts offset into doc.value to line and column in source file.

    Returns:
        line and column or else (-1,-1) if it cannot be computed
    """
    linenum = -1
    colnum = -1

    if doc.lineno is not None:
        linenum = doc.lineno # start of the docstring source
        # line offset with respect to start of cleaned up docstring
        lineoffset = clean_lineoffset = doc.value.count("\n", 0, offset)

        # look at original doc source, if available
        try:
            source = doc.source
            # compute docstring without cleaning up spaces and indentation
            rawvalue = str(literal_eval(source))

            # adjust line offset by number of lines removed from front of docstring
            lineoffset += leading_space(rawvalue).count("\n")

            if lineoffset == 0 and (m := re.match(r"(\s*['\"]{3}\s*)\S", source)):
                # is on the same line as opening triple quote
                colnum = offset + len(m.group(1))
            else:
                # indentation of first non-empty line in raw and cleaned up strings
                raw_line = rawvalue.splitlines()[lineoffset]
                clean_line = doc.value.splitlines()[clean_lineoffset]
                raw_indent = len(leading_space(raw_line))
                clean_indent = len(leading_space(clean_line))
                try:
                    linestart = doc.value.rindex("\n", 0, offset) + 1
                except ValueError: # pragma: no cover
                    linestart = 0 # paranoid check, should not really happen
                colnum = offset - linestart + raw_indent - clean_indent

        except Exception:
            # Don't expect to get here, but just in case, it is better to
            # not fix up the line/column than to die.
            pass

        linenum += lineoffset

    return linenum, colnum


def leading_space(s: str) -> str:
    """Returns whitespace at the front of string."""
    if m := re.match(r"\s*", s):
        return m[0]
    return "" # pragma: no cover

analog-cbarber avatar Mar 23 '25 16:03 analog-cbarber

Thanks! Non-trivial indeed. OK so I was only thinking in terms of column offset but IIUC the offset parameter here is a position in the string, which can contain arbitrary new lines \n and indentation which are different between the cleaned up docstring and the source code, and we want to convert it back to the position in the source docstring.

Such a method would be necessary for dynamic analysis too, starting at Python 3.13, since this version started cleaning up docstrings at the compiler level.

Could you show examples of how you would use this method in your own code 🙂?

pawamoy avatar Mar 24 '25 10:03 pawamoy

Basically, I plan to use it when logging warnings on bad crossrefs in doc strings. Here is what my current code looks like (but I am still working on this):

`        if parent is not None:  # pragma: no branch
            # We include the file:// prefix because it helps IDEs such as PyCharm
            # recognize that this is a navigable location it can highlight.
            prefix = f"file://{parent.filepath}:"
            line, col = doc_value_offset_to_location(doc, self._cur_offset)
            if line >= 0:
                prefix += f"{line}:"
                if col >= 0:
                    prefix += f"{col}:"

            prefix += " \n"

        logger.warning(prefix + msg)

I think if you wanted to support smart syntax highlighting in docstrings, you probably wouldn't want to make a call to this function for every character for which you need a true location. Probably instead you would just want a function that gives the true starting line and column offsets for the first and subsequent lines. The column offset for the first line can be different if it is on the same line as the quotes, but I believe that all the subsequent ones should be the same.

analog-cbarber avatar Mar 28 '25 14:03 analog-cbarber