pygls icon indicating copy to clipboard operation
pygls copied to clipboard

TextDocument.lines and completion request at EOF

Open nthykier opened this issue 6 months ago • 20 comments

I have been using pygls as the underlying LSP implementation for my Language server debputy, which interprets various Debian based formats. I observed a reoccurring problem that I would get out of bound errors when during completions at the end of the file and had to fix that in each file format where it occurred.

I traced this down to me using TextDocument.lines and receiving a client position of (N + 1, 0) where N is the number of lines returned by TextDocument.lines. This occurs when the user creates a new line at the end of the file and then trigger completion. Consider the following YAML snippet where <CURSOR> marks where the user would insert its next character if they typed one:

foo:
  bar: some-value<CURSOR>

At this point, the text document ends with ...bar: some-value. If the user then hits enter followed by triggering a newline to complete another top-level key (sibling to foo), then naive usage of TextDocument.lines + looking up the client position will trigger a value error (for simplicity I am leaving out the position codec stuff; it is there and not relevant to the problem).

This happens because the file content (TextDocument.source) now ends with ...bar: some-value\n. Because there is nothing after the last newline, the splitlines will result in the list:

[
    "foo:\n",
    "  bar: some-value\n"
]

However, the client position for this is in many editors (2, 0) and that cause an index error. I have triggered this case with emacs and kate so far out of the three I have tested with (vim being the last, and the fact I have not reproduced likely says more about my skills with vim than whether the problem is reproducible)

The easy solution is to special-case TextDocument.lines to add a new empty string to the list of lines, which is basically the work around I have been using in my LSP. Adding the empty string is also safe in case the lines are later joined back to a single string.

nthykier avatar Jun 15 '25 07:06 nthykier

Thanks for the detailed report and nice bug hunting. So we're talking about this method? And the idea is just to append a newline to the lines that that method returns?

Do you think this could also be to do with the default behaviour of some editors that ensure files always ends with a newline? This being a POSIX requirement. When you say, "At this point, the text document ends with ...bar: some-value", I wonder if that's really true, as strictly speaking a text file isn't valid unless it ends with a newline?

I'm curious, would appending the newline to Textdocument.lines be reliably consistent in all cases?

tombh avatar Jun 16 '25 19:06 tombh

Thanks for the detailed report and nice bug hunting. So we're talking about this method? And the idea is just to append a newline to the lines that that method returns?

It would be to ensure that the line position provided by the client is always resolvable, which it is not when the position is at the EOF due to how .splitlines handles that particular case.

Do you think this could also be to do with the default behaviour of some editors that ensure files always ends with a newline? This being a POSIX requirement. When you say, "At this point, the text document ends with ...bar: some-value", I wonder if that's really true, as strictly speaking a text file isn't valid unless it ends with a newline?

The problem occurs exactly when the file end on a newline and you attempt to complete just after the final newline. The file without the newline was to demonstrate how to get there conceptually and I recommend we "step back" from my starting explanation here since it did not aid your understanding as I intended.

At the time of completion, there is newline at the end of the file and literally nothing after said final newline. The LSP specs enable you to trigger a completion without typing anything and you have to do that to trigger this problem (the moment you type, the typed character would cause an extra element in the list returned by .splitlines(True) and you avoid the crash).

I'm curious, would appending the newline to Textdocument.lines be reliably consistent in all cases?

I think my explanation might have lead you astray. There is no physical adding of a newline in my variant of the code. The code that I use would be basically look like this for TextDocument:

def lines(self) -> List[str]:
       lines = self.source.splitlines(True)
       if not lines or lines[-1].endswith("\n"):
           lines.append("")
       return lines

This also works for the special case of a completely empty file, where the completion request (if triggered without typing) would have position (0, 0) but "".splitlines(True) returns [] so doing doc.lines[position.line] will raise IndexError. The choice of lines.append("") means that a naive "".join(lines) will equal to doc.source - that is many uses of lines` will find this change transparent.

nthykier avatar Jun 17 '25 20:06 nthykier

Ohh yes, that makes more sense, thank you.

So I don't know enough about the usage of TextDocument.lines in the wild to make a call on this. But my hunch is that it could break downstream in subtle ways, as TextDocument.lines should just return the literal lines of the file. But that empty lines case is interesting, I think that TextDocument.lines could very reasonably return [""] rather than [] or None. And why do I think that?? Is it the same reason that the list should also always end with ..., ""]??

Also another approach, what do you think about a new method like TextDocument.get_line_at(row_index)?

tombh avatar Jun 18 '25 15:06 tombh

Ohh yes, that makes more sense, thank you.

So I don't know enough about the usage of TextDocument.lines in the wild to make a call on this. But my hunch is that it could break downstream in subtle ways, as TextDocument.lines should just return the literal lines of the file.

For comparison, the current approach also break in subtle ways in the wild. Namely causing off-by-one errors. But I will not hold it against anyone for calling my suggestion for a hack.

But that empty lines case is interesting, I think that TextDocument.lines could very reasonably return [""] rather than [] or None. And why do I think that?? Is it the same reason that the list should also always end with ..., ""]??

My rationale for [..., ""] is aimed at pragmatism rather than founded in any principle or ideal (other than the principle of pragmatism, I suppose).

An alternative idea is to return a subclass list that instead of IndexError returns "" for this case. It would have a similar effect as adding an extra element but I can see how it also would rub people the wrong way as well.

Also another approach, what do you think about a new method like TextDocument.get_line_at(row_index)?

I feel get_line_at would go against There should be one-- and preferably only one --obvious way to do it. from the Zen of Python. We already have TextDocument.lines so the "obvious" way of getting a given line would be to do doc.lines[position.line]. The doc.get_line_at(position.line) would at first glance do the "same" but would be less obvious to newcomers. If there is a delta between the two that makes one of them safer (less likely to crash/raise), I inclined to think it should be done on method that people are most likely to pick by default/accident and in my book that would be TextDocument.lines[position.line].

The end goal I am aiming for is to avoid everyone having to do

@ls.feature(TEXT_DOCUMENT_COMPLETION)
def completion(ls, params):
   doc = ls.workspace.get_text_document(params.text_document.uri)
   lines = doc.lines
   server_pos = doc.position_codec.position_from_client_units(lines, params.position)
   # This if-statement is required but people are likely to be missing it (`TextDocument.word_at_position` seems to be missing it)
   if server_pos.line >= len(lines):
       line = ""
   else:
       line = lines[server_pos.line]
   ... # Do completion for `line`

This scenario also has a special-case for position depending on how you go about it since line can be "" . However, fortunately line[0:server_pos.character] and line[server_pos.character:] handles this case gracefully and returns "" for the empty string with server_pos.character being 0 - the TextDocument.word_at_position would also handle this case gracefully (if it did not trip on the missing bounds check for the lines vs. server_pos.line). So that part is less likely to trigger an IndexError (you have to do a single character lookup rather than a range lookup to trigger the exception). This lead me to file this issue.

nthykier avatar Jun 18 '25 19:06 nthykier

The more I think about this the more I think we should do it. main is currently a pre-release for a major version change, so now would be a perfect time to add it.

My rationale goes back to the POSIX requirement that a text file has to end in a new line. So whilst it is possible in theory to have a file that doesn't end in a newline, it is more useful for pygls to commit to a consistent behaviour where its internal representation of a file always ends with a new line. What do you think?

tombh avatar Jun 19 '25 14:06 tombh

The more I think about this the more I think we should do it. main is currently a pre-release for a major version change, so now would be a perfect time to add it.

My rationale goes back to the POSIX requirement that a text file has to end in a new line. So whilst it is possible in theory to have a file that doesn't end in a newline, it is more useful for pygls to commit to a consistent behaviour where its internal representation of a file always ends with a new line. What do you think?

I feel we are still not on the same page on what the problem is since you are talking about whether a file should always ending in a newline. This makes me concerned that you are agreeing with me on the wrong assumption or basis and that the PR will not do what you think it would be doing.

I have attached a screenshot of me doing a naive bit of logging code that logs a few statements and then tries to doc.lines[params.position.line]. Note that the code logs whether doc.source ends with a \n in the second last line of the screenshot. In the other screenshot, you can see the kate editor showing the logs from this program. The cursor is on line that kate displays as 9 (but it correctly reports as 8 in the LSP position). This is the cursor position when I trigger a completion. In the logs, you should see:

  1. The doc.source print which shows the source ends with ....md\n. + the doc.source.ends('\n') is also printed and shows True. Both of these are highlighted and I hope we can agree that this file without any doubt ends with a newline.

  2. Between these two highlights, you might be able to make out the doc.lines list. Note the last element also ends with ....md\n. Additionally there is a print of the params.position which shows the position as 8:0 (Line 8, position 0) - in other words, directly after the final newline.

  3. A crash caused by performing doc.lines[params.position.line]. Image Image

I am happy to do a PR once I feel I have confirmed that we both understand what this scenario is about, and we agree on what the problem is being solved by that PR.

(Related, I digged into doc.word_at_position and it has an explicit guard for this case, so I think pygls has seen this problem in the past)

nthykier avatar Jun 19 '25 15:06 nthykier

I think most of us would find it surprising for TextDocument.lines to return something other than the lines in the text document.

dimbleby avatar Jun 22 '25 00:06 dimbleby

@dimbleby I totally agree and I think OP agrees too. However, I now see there's very reasonable argument to suggest that there's more nuance. At the very least, it's possible for the cursor to be positioned outside the lines of a document. So I think at the very least this is a worthy conversation to be having.

tombh avatar Jun 23 '25 19:06 tombh

@nthykier

I feel we are still not on the same page on what the problem is since you are talking about whether a file should always ending in a newline.

Thanks for your patience and all your effort in taking your time to explain this so clearly.

Now I'm leaning the other way again. What if this is the responsibility of the client, therefore the editor itself? Extending your original example:

foo:
  bar: some-value
<CURSOR>

Are you saying that some editors send that to the LSP server as this?

[
    "foo:\n",
    "  bar: some-value\n"
]

Whereas ideally you're wondering if it should be universally interpreted as this?

[
    "foo:\n",
    "  bar: some-value\n",
    "\n"
]

If that's the case, then if it's the client's responsibility to both send the document's contents to the LSP server and to provide the cursor's position to the user, then could we argue that it's the client's responsibility to ensure that the two are compatible?

tombh avatar Jun 23 '25 20:06 tombh

At the very least, it's possible for the cursor to be positioned outside the lines of a document.

sure. But I find it very artificial to get from there to "so we should pretend that the document is bigger than it is".

Rather, clients and servers should handle the possibility that the cursor is positioned outside the lines of a document.

dimbleby avatar Jun 23 '25 20:06 dimbleby

@nthykier

I feel we are still not on the same page on what the problem is since you are talking about whether a file should always ending in a newline.

Thanks for your patience and all your effort in taking your time to explain this so clearly.

You are welcome. :)

Now I'm leaning the other way again. What if this is the responsibility of the client, therefore the editor itself? Extending your original example:

foo: bar: some-value <CURSOR>

Are you saying that some editors send that to the LSP server as this?

[ "foo:\n", " bar: some-value\n" ]

This is exactly what you will get from lines with clients I have tested with in this particular case. For what it is worth, I am convinced the client is right to provide those two lines as that is the file contents at this time. I even agree with the cursor.

Whereas ideally you're wondering if it should be universally interpreted as this?

[ "foo:\n", " bar: some-value\n", "\n" ]

What I am suggesting is that pygls special-case this scenario and has TextDocument.lines return:

[
   "foo:\n",
   "  bar: some-value\n",
  "",
]

(Note I use "" rather than "\n" from your example for the final line.)

The choice of the trailing empty string is to avoid changing the content if someone does "".join(lines)", which is a technique used in some LSP servers (mine included). Related, LSPs already has to cope with the last line not ending newline, so the empty string should not cause issues with any language server in the wild.

If that's the case, then if it's the client's responsibility to both send the document's contents to the LSP server and to provide the cursor's position to the user, then could we argue that it's the client's responsibility to ensure that the two are compatible?

In your view, what should the client be doing here?

As I see it, that the client provided cursor is compatible with the client provided contents. The way I think of the cursor is the position of the next character to be inserted. If a user was to insert another character, then it would be line position 2 (0-indexed), character position 0 (or 2:0 when using str(params.position)). If you reduce this all the way back to its base case of an empty document, the only position available is 0:0. Insert a newline and you are now at 1:0. Insert another newline and you are a 2:0 (and so on), which is basically the situation we are in here as far as line count goes. If you were to insert an a at this point, the cursor would (in my understanding) then be at 2:1.

The sequence being:

  • Source / what would be saved to disk as the file contents: "", "\n", "\n\na
  • Lines as it is implemented now: [], ["\n"], ["\n", "\n"], `["\n", "\n", "a"]
  • Cursor position: 0:0, 1:0, 2:0, 2:1

nthykier avatar Jun 23 '25 21:06 nthykier

And what if the server wants to

  • count the lines in the document, or
  • lint that it has a posix-compliant file in which all lines (including the last) end in a newline, or
  • access the first character of the last line?

Or probably lots more similar that I have not yet thought of

The artificial trailing empty string brings its own new out-by-ones and confusion.

Perhaps it would be better instead to propose a new helper function doing whatever it is that you want to do, without changing TextDocument.lines

dimbleby avatar Jun 24 '25 00:06 dimbleby

And what if the server wants to

* count the lines in the document, or

* lint that it has a posix-compliant file in which all lines (including the last) end in a newline, or

* access the first character of the last line?

Or probably lots more similar that I have not yet thought of

The artificial trailing empty string brings its own new out-by-ones and confusion.

Perhaps it would be better instead to propose a new helper function doing whatever it is that you want to do, without changing TextDocument.lines

I gather you are not sold on the trade-off being proposed here. If the maintainers are not either then we will simply close this issue with no changes and the current status quo will remain.

nthykier avatar Jun 24 '25 06:06 nthykier

@dimbleby I feel like you haven't been giving this conversation the attention it deserves. Neither OP nor I are here to make significant API changes without good, well-reasoned and well-documented reasons, and also of course community consensus. Not only has the helper function idea already been suggested but all your concerns are trivially rebutted with TextDocument.source, which I'm sure you're already aware of. Not that either of us want to dismiss your input here. However, saying, "Or probably lots more similar that I have not yet thought of", is dismissive and not in keeping with the level of engagement that has been established in this thread. OP has, just like you, clearly invested a lot of time in pygls, and has now taken the time to come with a sincere suggestion for improving the project. As such, whether their point of view is right or wrong is beside the point. As older members of this project our jobs should be to support such motivated newcomers. If we can't support each other on Github Issues and Discussions in exploring new ideas and mutually deepening our understanding, then where can we do so?

tombh avatar Jun 24 '25 15:06 tombh

@nthykier

The way I think of the cursor is the position of the next character to be inserted.

Yet another great point. It feels like it goes right to the heart of the matter. Most of the time this next character is within the existing document, where it can't challenge the existential meaning of the document's boundaries. It is just this edge case of the cursor being at the end of the document that invites pause for reflection. "End", from the cursor's perspective, can be reasonably interpreted as outside the current document and inside the potential new version of the document. This doesn't lead me to a solution, I just really appreciate the way it frames the issue.

In your view, what should the client be doing here?

Always send a document in which the cursor position is valid. But I don't mean that as a serious solution, it's more as a thought experiment. Can we establish a universally agreed upon and intuitive model of a document that pygls can sensibly adhere to whilst allowing individual editor's to reasonably deviate from? The case of Vim and Vim modal editing comes to mind. A significant number, if not a majority, of editors support Vim's Normal mode in which the cursor cannot be placed at the cursor position after EOF. Now of course, Vim's Insert mode is equally important as Normal mode, and it can place the cursor after EOF, so maybe this isn't a useful datapoint.

For the sake of argument, let's say that the document model where a cursor can be placed after EOF is universal enough. It may not be intuitive but it is widely enough implemented that it can be used as a basis for defining the expectations of a method like, TextDocument.lines. But what does "lines" mean here? I mean obviously at face value it means the lines of the document. But what currency does "lines", as a term within the understanding of the document model, have? I think it could be argued that it doesn't have any notable currency, or rather expectations. Whereas the term "document" does. As a practical example to the point I'm trying to make, I think we could even argue that pygls should remove the TextDocument.lines method! Whereas we certainly couldn't remove TextDocument from pygls.

I didn't add the original TextDocument.lines method, so I can't say for sure what the reason for its existence was, but I suspect it was merely for convenience. But the fact that it is now part of the official API may be giving it more formality, or currency, than it deserves. A user is forgiven for assuming that such a trivial function exists because it also encodes deeper guarantees, like for example always supporting indexing by cursor position.

So just as a thought experiment, what do you think about removing the method?

tombh avatar Jun 24 '25 16:06 tombh

I intend only to be making plain technical points - it's true that I am relatively concise in doing so. In particular, noting that I have not thought of everything is certainly not intended to be dismissive of what others may or may not have considered.

I expect it's clear by now that In My Opinion: TextDocument.lines is currently correctly implemented, and the proposed change brings at least as many problems as it solves. I'll leave it there.

dimbleby avatar Jun 24 '25 17:06 dimbleby

@nthykier

The way I think of the cursor is the position of the next character to be inserted.

Yet another great point. It feels like it goes right to the heart of the matter. Most of the time this next character is within the existing document, where it can't challenge the existential meaning of the document's boundaries. It is just this edge case of the cursor being at the end of the document that invites pause for reflection. "End", from the cursor's perspective, can be reasonably interpreted as outside the current document and inside the potential new version of the document. This doesn't lead me to a solution, I just really appreciate the way it frames the issue.

In your view, what should the client be doing here?

Always send a document in which the cursor position is valid. But I don't mean that as a serious solution, it's more as a thought experiment. Can we establish a universally agreed upon and intuitive model of a document that pygls can sensibly adhere to whilst allowing individual editor's to reasonably deviate from? The case of Vim and Vim modal editing comes to mind. A significant number, if not a majority, of editors support Vim's Normal mode in which the cursor cannot be placed at the cursor position after EOF. Now of course, Vim's Insert mode is equally important as Normal mode, and it can place the cursor after EOF, so maybe this isn't a useful datapoint.

To be fair, I never gave the Normal mode cursor model much thought before now you mentioned it here. But I can see it being a source of miscommunication when we are not aligned on which cursor model is used.

I went to check the LSP specs and fortunately it does seem to make a stance on how to interpret the position. As I read it, the LSP specs use the Insert Mode cursor as its position model (https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#position):

Position

Position in a text document expressed as zero-based line and zero-based character offset. A position is between two characters like an ‘insert’ cursor in an editor. Special values like for example -1 to denote the end of a line are not supported.

Here I assume that 'insert' cursor translates directly to Insert Mode.

Related: With the cursor framing here, I think what I have been trying to express is that it feels like the current model of lines feels incompatible with the insert cursor positioning model (specifically triggered by the out of bounds case). Not that it aided me to solve the problem differently, but maybe the re-framing here is helpful for someone to spot a better solution.

For the sake of argument, let's say that the document model where a cursor can be placed after EOF is universal enough. It may not be intuitive but it is widely enough implemented that it can be used as a basis for defining the expectations of a method like, TextDocument.lines. But what does "lines" mean here? I mean obviously at face value it means the lines of the document. But what currency does "lines", as a term within the understanding of the document model, have? I think it could be argued that it doesn't have any notable currency, or rather expectations. Whereas the term "document" does. As a practical example to the point I'm trying to make, I think we could even argue that pygls should remove the TextDocument.lines method! Whereas we certainly couldn't remove TextDocument from pygls.

I didn't add the original TextDocument.lines method, so I can't say for sure what the reason for its existence was, but I suspect it was merely for convenience. But the fact that it is now part of the official API may be giving it more formality, or currency, than it deserves. A user is forgiven for assuming that such a trivial function exists because it also encodes deeper guarantees, like for example always supporting indexing by cursor position.

So just as a thought experiment, what do you think about removing the method?

I think I see where you are coming from. If positions / ranges had not been line-based, I would have been inclined to agree with you. However, given line numbers are central in all positioning then I think the lines of the documents has some currency as an entity. All the position encoding and decoding also heavily relies on accessing individual lines of the files. In my view TextDocument should definitely have a trivial translation into the lines for that reasoning.

Both in my language server but also all the examples in pygls that interfaces with the content of the document question end up using doc.lines. The exceptions end up being stuff like that commands.py where the commands just work on the arguments they are given regardless of which document they are used in.

For what it is worth, I believe the JetBrains (IDEA, PyCharm, etc.) stack focuses less on lines in their API model towards plugin writers. Here every part of the document is tokenized (a text range that can span 1 or more characters including newline characters) and an AST is built on top. You still report things against text ranges, but the ranges uses character offset relative to another token or the beginning of the file rather than line + character offset in the line. At some point their implementation of the API must then translate those offset into the lines shown to the user, but the plugin writer is never involved as I remember. Here, there is no "doc.lines". But importantly, you do not need it because everything is character offsets and you (as a plugin writer) never need a line offset.

It is possible to define an API for working with a text document where the language server would never need a "lines" model of the document. However, with the LSP specs heavy focus on lines in every position and range, I find it difficult to see a model where it would be the right choice for a language server. Equally importantly, I think pygls as a language server framework is in the best position to define a common suitable lines model for language servers implemented in pygls. As such, I would not immediately sold on the proposal as a long term solution.

Though short term, I could buy it as "TextDocument.lines is not the right model currently and we do not want to commit to it, so lets remove it in 2.0 until we can find the right model". I would see it as a loss and it would likely hamper adoption to 2.0, but I am also a fan of "less is more" for APIs - especially if you are suspect that the API in question might be the wrong model. Though, it would still have consequences for the pygls API (position codec immediately comes to mind for me as something that would be massively affected). So if you are seriously entertaining the idea of removing that method, we would need to consider the consequences for other API methods.

(Side tracking a bit here: I can entertain position codec being stateful and bound the to TextDocument it is from, so we skip the lines argument to all of its methods. Not sure it solves all cases or is the right solution, but it is certainly an API that I could see the benefit in as a language server writer, since I would not have to throw lines around just for the position translation. Though I have personally been limited in my work to not using the workspace related features, so perhaps there is a use-case there that I do not know about where this alternative model/API would fall apart)

nthykier avatar Jun 24 '25 18:06 nthykier

I went to check the LSP specs and fortunately it does seem to make a stance on how to interpret the position. As I read it, the LSP specs use the Insert Mode cursor as its position model (https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#position):

Good thinking, it's useful to have the spec's definition of position. I wonder if the "insert cursor" is referring more to the pipe-style cursor, "|", to contrast it with the block style one, "█"? Which as it happens is also ho Vim modes express Insert and Normal modes.

However, given line numbers are central in all positioning then I think the lines of the documents has some currency as an entity.

Another great point. How can we say that line numbers don't have currency when we talk about document models!? They're inextricable components of diffs, error messages, Github links, etc.

For what it is worth, I believe the JetBrains (IDEA, PyCharm, etc.) stack focuses less on lines in their API model towards plugin writers. Here every part of the document is tokenized (a text range that can span 1 or more characters including newline characters) and an AST is built on top.

Interesting, I didn't know that. Of course a tree doesn't have line numbers, everything is related to parent, siblings and children.

It is possible to define an API for working with a text document where the language server would never need a "lines" model of the document. However, with the LSP specs heavy focus on lines in every position and range, I find it difficult to see a model where it would be the right choice for a language server. Equally importantly, I think pygls as a language server framework is in the best position to define a common suitable lines model for language servers implemented in pygls. As such, I would not immediately sold on the proposal as a long term solution.

Agreed. I think the thought experiment idea has been useful though, because I actually have a serious idea now! What if we change the signature of the TextDocument.lines method to:

def lines(self, cursor_position: Optional[types.Position] = None) -> Sequence[str]:
        """Return the lines of the document.

        Parameters
        ----------
        cursor_position
           The position of the cursor (optional). When the cursor is at the end of a
           document (or at the beginning of an empty document) editors must create a
           temporary new line within which the cursor can be displayed. Whilst this
           new line is not yet formally part of the document source it is nevertheless
           relevant. Consider an LSP completion request in a new and empty file. Even
           though `TextDocument.source` is `[]` the editor's cursor position occupies
           the empty string at `[""]`. Providing this argument is useful to avoid
           unexpected index out of bounds errors.

        """
        lines = self.source.splitlines(True)
        if cursor_position and cursor_position.line >= len(lines):
            lines.append("")
        return lines

Firstly, this is completely backwards compatible and then, most importantly, it both encodes the behaviour in the type system and clearly articulates the reasoning in the documentation.

Pleasingly this also seamlessly fits into both TextDocument.word_at_position and TextDocument.offset_at_position that already take the cursor position as an argument.

tombh avatar Jun 26 '25 17:06 tombh

[...]

Agreed. I think the thought experiment idea has been useful though, because I actually have a serious idea now! What if we change the signature of the TextDocument.lines method to:

Thanks for proposing an alternative solution. :)

def lines(self, cursor_position: Optional[types.Position] = None) -> Sequence[str]: """Return the lines of the document.

    Parameters
    ----------
    cursor_position
       The position of the cursor (optional). When the cursor is at the end of a
       document (or at the beginning of an empty document) editors must create a
       temporary new line within which the cursor can be displayed. Whilst this
       new line is not yet formally part of the document source it is nevertheless
       relevant. Consider an LSP completion request in a new and empty file. Even
       though `TextDocument.source` is `[]` the editor's cursor position occupies
       the empty string at `[""]`. Providing this argument is useful to avoid
       unexpected index out of bounds errors.

I think we are trying to say that for an empty document, TextDocument.lines would be [] except when due to the cursor it will be [""]. However, the text end up saying is that 'TextDocument.sourceis a[](Sequence), whileTextDocument.sourcewould be""` (str).

    """
    lines = self.source.splitlines(True)
    if cursor_position and cursor_position.line >= len(lines):
        lines.append("")
    return lines

Firstly, this is completely backwards compatible and then, most importantly, it both encodes the behaviour in the type system and clearly articulates the reasoning in the documentation.

Currently, TextDocument.lines is a property and cannot take extra parameters in practice - it would have to turn into a regular method for the extra parameters to be usable (adding a optional kwarg to a @property does not trigger an error at definition time, but when using the property, all "obvious" attempts to pass the parameter failed in my tests). As I understand it, turning it into a regular method would break backwards compatibility. My best "counter" would be to keep lines as a property that returns a magic return value, which behaves like a list and a function at the same time (old code would use it like a list and new code like a function).

The magic return value should be doable, if we want to go down this route. Though, it would implement another trade-off between ease of migration/supporting early adopters (being backwards compatible) and maintaining simplicity of the API with this solution (drop @property with no backwards compatibility).

nthykier avatar Jun 29 '25 06:06 nthykier

Ohh the @property, I completely forgot about that! Can you give an example of this magic return value?

But also, removing the @property would only be a breaking change from lines to lines() right? So LSP authors wouldn't need to change anything more than that, which I think is reasonable for a major version change.

And yes of course we can change the doc comments.

tombh avatar Jun 29 '25 14:06 tombh