lsp-mode icon indicating copy to clipboard operation
lsp-mode copied to clipboard

Feature Request: UTF-8 support

Open axelson opened this issue 3 years ago • 12 comments
trafficstars

It seems likely that a future version of the Language Server Protocol will allow servers and clients to support UTF-8 instead of UTF-16: https://github.com/microsoft/language-server-protocol/issues/376#issuecomment-960633758

I'd like to request that lsp-mode support UTF-8 as an encoding. One benefit is to avoid https://github.com/emacs-lsp/lsp-mode/issues/2080 and it will also hopefully have a speed benefit when compared to full UTF-16 support.

Rust Analyzer LSP Server has already implemented their own version of UTF-8 negotiation although I don't have a link handy.

axelson avatar Feb 05 '22 20:02 axelson

ATM we run using utf-8. we should fix #2080 in a way to allow using utf8 when the server supports that.

yyoncho avatar Feb 05 '22 20:02 yyoncho

I'd love to implement that on clojure-lsp side, although have no clue what would be like the necessary changes

ericdallo avatar Feb 05 '22 20:02 ericdallo

ATM we run using utf-8

@yyoncho did you mean to say UTF-16 here?

ddickstein avatar Aug 08 '22 19:08 ddickstein

Specific improvement here would be to add "positionEncodings": ["utf-8", "utf-16"] somewhere here:

https://github.com/emacs-lsp/lsp-mode/blob/b4e8aac32d28dfe0f73e0981387c5b20249f385c/lsp-mode.el#L3530-L3625

Just signaling support for utf8 should be enough to solve the problems for servers which support utf8

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#positionEncodingKind

matklad avatar Nov 19 '22 00:11 matklad

I just ran into this issue today, it still crashes rust-analyzer.

Can we get at least one of these fixes applied?

  • correct offsets: https://github.com/yyoncho/lsp-mode/commit/aac9b6a10611ce9fcb4c9fe272f2835f0b6bb210
  • declaring UTF-8 lsp mode: https://github.com/emacs-lsp/lsp-mode/issues/3344#issuecomment-1320677622

teor2345 avatar Feb 03 '23 04:02 teor2345

I’m pretty unclear on the current state of this plugin after reading through comments on this and the related issues. I understand that right now this plugin only declares support for UTF-16.

(1) are there any outstanding encoding issues with UTF-16?

(2) does this plugin actually support UTF-8 but just not announce that support to servers, or does it not actually support UTF-8 (in which case it should not claim that it does)?

On Thu, Feb 2, 2023 at 11:41 PM teor @.***> wrote:

I just ran into this issue today, it still crashes rust-analyzer.

Can we get at least one of these fixes applied?

— Reply to this email directly, view it on GitHub https://github.com/emacs-lsp/lsp-mode/issues/3344#issuecomment-1414930459, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGD5T76JCTEDLX7TX4Z6YTWVSD6TANCNFSM5NUM5MVQ . You are receiving this because you commented.Message ID: @.***>

ddickstein avatar Feb 03 '23 14:02 ddickstein

@ddickstein I'm not a lsp-mode user, but my understanding is that yes, lsp-mode still sends UTF-8 offsets while the protocol mandates UTF-16. This has probably been reported dozens times as a rust-analyzer bug (because lsp-mode corrupts our state), but most likely affects other language servers just as well.

Advertising UTF-8 offset encoding might fix RA, but it's optional, so servers which don't implement it will still have problems. But it fixes our problem, it's better for performance, and the UTF-16 encoding can still be fixed in the future, maybe like in https://github.com/yyoncho/lsp-mode/commit/aac9b6a10611ce9fcb4c9fe272f2835f0b6bb210.

lnicola avatar Feb 10 '23 07:02 lnicola

As probably noticed elsewhere, fasterthanli did a writeup of this issue -

https://fasterthanli.me/articles/the-bottom-emoji-breaks-rust-analyzer#the-language-server-protocol

@yyoncho I hope this is useful and not annoying, thanks for all your help over the years, lsp-mode has been awesome and I've never noticed this bug.

hammerandtongs avatar Feb 13 '23 17:02 hammerandtongs

Can someone using the patch report if there are noticeable perf impact? If no - I will merge it as default option.

yyoncho avatar Feb 13 '23 18:02 yyoncho

@yyoncho can you confirm @lnicola's reply above is correct (namely, that already UTF-8 is supported just not declared, and UTF-16 is declared but has a bug), and if so can UTF-8 support be declared and this issue closed and a separate issue be made to track the bug in UTF-16?

ddickstein avatar Mar 29 '23 18:03 ddickstein

It's incorrect. It seems that the "native" encoding of Emacs is UTF-32, which lsp-mode now advertises. So UTF-8 is still unimplemented, but that's fine from rust-analyzer's point of view because we now support UTF-32.

lnicola avatar Mar 29 '23 18:03 lnicola