lsp-mode
lsp-mode copied to clipboard
Feature Request: UTF-8 support
It seems likely that a future version of the Language Server Protocol will allow servers and clients to support UTF-8 instead of UTF-16: https://github.com/microsoft/language-server-protocol/issues/376#issuecomment-960633758
I'd like to request that lsp-mode support UTF-8 as an encoding. One benefit is to avoid https://github.com/emacs-lsp/lsp-mode/issues/2080 and it will also hopefully have a speed benefit when compared to full UTF-16 support.
Rust Analyzer LSP Server has already implemented their own version of UTF-8 negotiation although I don't have a link handy.
ATM we run using utf-8. we should fix #2080 in a way to allow using utf8 when the server supports that.
I'd love to implement that on clojure-lsp side, although have no clue what would be like the necessary changes
ATM we run using utf-8
@yyoncho did you mean to say UTF-16 here?
Specific improvement here would be to add "positionEncodings": ["utf-8", "utf-16"] somewhere here:
https://github.com/emacs-lsp/lsp-mode/blob/b4e8aac32d28dfe0f73e0981387c5b20249f385c/lsp-mode.el#L3530-L3625
Just signaling support for utf8 should be enough to solve the problems for servers which support utf8
https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#positionEncodingKind
I just ran into this issue today, it still crashes rust-analyzer.
Can we get at least one of these fixes applied?
- correct offsets: https://github.com/yyoncho/lsp-mode/commit/aac9b6a10611ce9fcb4c9fe272f2835f0b6bb210
- declaring UTF-8 lsp mode: https://github.com/emacs-lsp/lsp-mode/issues/3344#issuecomment-1320677622
I’m pretty unclear on the current state of this plugin after reading through comments on this and the related issues. I understand that right now this plugin only declares support for UTF-16.
(1) are there any outstanding encoding issues with UTF-16?
(2) does this plugin actually support UTF-8 but just not announce that support to servers, or does it not actually support UTF-8 (in which case it should not claim that it does)?
On Thu, Feb 2, 2023 at 11:41 PM teor @.***> wrote:
I just ran into this issue today, it still crashes rust-analyzer.
Can we get at least one of these fixes applied?
- correct offsets: @.*** https://github.com/yyoncho/lsp-mode/commit/aac9b6a10611ce9fcb4c9fe272f2835f0b6bb210
- declaring UTF-8 lsp mode: #3344 (comment) https://github.com/emacs-lsp/lsp-mode/issues/3344#issuecomment-1320677622
— Reply to this email directly, view it on GitHub https://github.com/emacs-lsp/lsp-mode/issues/3344#issuecomment-1414930459, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGD5T76JCTEDLX7TX4Z6YTWVSD6TANCNFSM5NUM5MVQ . You are receiving this because you commented.Message ID: @.***>
@ddickstein I'm not a lsp-mode user, but my understanding is that yes, lsp-mode still sends UTF-8 offsets while the protocol mandates UTF-16. This has probably been reported dozens times as a rust-analyzer bug (because lsp-mode corrupts our state), but most likely affects other language servers just as well.
Advertising UTF-8 offset encoding might fix RA, but it's optional, so servers which don't implement it will still have problems. But it fixes our problem, it's better for performance, and the UTF-16 encoding can still be fixed in the future, maybe like in https://github.com/yyoncho/lsp-mode/commit/aac9b6a10611ce9fcb4c9fe272f2835f0b6bb210.
As probably noticed elsewhere, fasterthanli did a writeup of this issue -
https://fasterthanli.me/articles/the-bottom-emoji-breaks-rust-analyzer#the-language-server-protocol
@yyoncho I hope this is useful and not annoying, thanks for all your help over the years, lsp-mode has been awesome and I've never noticed this bug.
Can someone using the patch report if there are noticeable perf impact? If no - I will merge it as default option.
@yyoncho can you confirm @lnicola's reply above is correct (namely, that already UTF-8 is supported just not declared, and UTF-16 is declared but has a bug), and if so can UTF-8 support be declared and this issue closed and a separate issue be made to track the bug in UTF-16?
It's incorrect. It seems that the "native" encoding of Emacs is UTF-32, which lsp-mode now advertises. So UTF-8 is still unimplemented, but that's fine from rust-analyzer's point of view because we now support UTF-32.