pygls icon indicating copy to clipboard operation
pygls copied to clipboard

Add option to `Document.source` to read files without universalized line endings.

Open karthiknadig opened this issue 3 years ago • 6 comments

Currently pygls provides the text document source where the line endings are universalized. This is an issue when computing diffs and while applying changes. Universalized line endings might not be the right thing when reading source files. https://github.com/openlawlibrary/pygls/blob/bbf671f509b0d499a006daeeae8cdb78e3419fe5/pygls/workspace.py#L273-L277

This is an example of a problem, where we are trying to handle refactoring request, and it leads to adding too many line endings. https://github.com/pappasam/jedi-language-server/issues/159 . We had a similar problem implementing formatting. Any scenario where text edits have to be handled this becomes an issue.

In the message where pygls gets the text from the Languange client, I see that the IDE preserves the line endings. See this trace for textDocument/didOpen where the server receives full text, line endings are as is.

[Trace - 11:14:48 PM] Sending notification 'textDocument/didOpen'.
Params: {
    "textDocument": {
        "uri": "file:///c%3A/GIT/s%20p/vscode-python/pythonFiles/interpreterInfo.py",
        "languageId": "python",
        "version": 1,
        "text": "# Copyright (c) Microsoft Corporation. All rights reserved.\r\n# Licensed under the MIT License.\r\n\r\nimport json\r\nimport sys\r\n\r\nobj = {}\r\nobj[\"versionInfo\"] = tuple(sys.version_info)\r\nobj[\"sysPrefix\"] = sys.prefix\r\nobj[\"sysVersion\"] = sys.version\r\nobj[\"is64Bit\"] = sys.maxsize > 2**32\r\n\r\nprint(json.dumps(obj))\r\n"
    }
}

It would be helpful if we can control the line endings for source files

karthiknadig avatar Feb 10 '22 07:02 karthiknadig

I think I'm seeing this issue too. But I don't completely understand the issue. Are you saying that forcing of UTF-8 in io.open(self.path, 'r', encoding='utf-8') is causing problems when the file itself is not UTF-8?

tombh avatar Dec 03 '22 20:12 tombh

I may not understand this issue but in my opinion encoding does not play the role here. I encounterred this issue time ago and I had to change line endings in source file to finish operation without adding extra lines and change line endings back. It was during editing python source file in vscode.

zbyna avatar Dec 03 '22 22:12 zbyna

You had to change line endings in the source file? How do you mean? You, as an LSP user, had to change line endings in your editor, VSCode or Vim or something? That obviously is not the solution we're looking for here, because we want to be able to support all forms of line endings.

tombh avatar Dec 05 '22 20:12 tombh

OK. We are on the same boat. It is not a solution to change line endings for proper LSP functioning. To avoid confusion, here is video how LSP behaves with different line endings (utf-8 encoded source file):

!video

zbyna avatar Dec 05 '22 23:12 zbyna

From Will open() function change line-endings in files? on python.org :

Assuming that you have CRLF endings in your file, the logic is:

    When reading, you will get a string with just LF.
    When you write that string back into the file, LF will be converted to CRLF again.

zbyna avatar Dec 06 '22 00:12 zbyna

Fantastic, I see exactly what you mean now. Thank you.

Also, I just read the Jedi server issue and there @karthiknadig suggested adding newline='' to Pygls' open() call. So, I think it would become this:

def source(self) -> str: 
    if self._source is None: 
        with io.open(self.path, 'r', encoding='utf-8', newline='') as f: 
            return f.read() 
    return self._source 

Could this be a potential fix?

tombh avatar Dec 06 '22 14:12 tombh