flow icon indicating copy to clipboard operation
flow copied to clipboard

LSP: unicode escape sequences cause fatal exception: "Invalid_argument Char.chr"

Open mwiencek opened this issue 4 years ago • 1 comments

I was trying to get LSP working with Sublime Text 4, but Flow kept crashing due to the presence of non-ASCII characters in my source code. I was able to reproduce this with a small one-file project containing only "”":

::  -> flow textDocument/didOpen: {'textDocument': {'version': 0, 'languageId': 'javascript', 'text': '// @flow\n\n"”";\n', 'uri': 'file:///home/michael/code/sublime-flow-lsp-chr-issue/index.js'}}
flow: Client fatal exception: (Invalid_argument Char.chr)
flow: Starting Flow server
flow: Raised at file "stdlib.ml", line 30, characters 20-45
flow: Called from file "src/hack_forked/utils/hh_json/hh_json.ml", line 183, characters 10-23
:: <-  flow window/logMessage: {'type': 3, 'message': 'Starting Flow server'}
flow: Called from file "src/hack_forked/utils/hh_json/hh_json.ml", line 200, characters 14-22
flow: Called from file "src/hack_forked/utils/hh_json/hh_json.ml", line 318, characters 10-22
flow: Called from file "src/hack_forked/utils/hh_json/hh_json.ml", line 265, characters 12-23
flow: Called from file "src/hack_forked/utils/hh_json/hh_json.ml", line 284, characters 14-21
flow: Called from file "src/hack_forked/utils/hh_json/hh_json.ml", line 318, characters 10-22
flow: Called from file "src/hack_forked/utils/hh_json/hh_json.ml", line 265, characters 12-23
flow: Called from file "src/hack_forked/utils/hh_json/hh_json.ml", line 284, characters 14-21
flow: Called from file "src/hack_forked/utils/hh_json/hh_json.ml", line 318, characters 10-22
flow: Called from file "src/hack_forked/utils/hh_json/hh_json.ml", line 265, characters 12-23
flow: Called from file "src/hack_forked/utils/hh_json/hh_json.ml", line 284, characters 14-21
flow: Called from file "src/hack_forked/utils/jsonrpc/jsonrpc.ml", line 159, characters 35-63
flow: Called from file "src/hack_forked/utils/sys/daemon.ml", line 272, characters 4-26
flow: Called from file "src/flow.ml", line 106, characters 4-31
flow: ---
flow: Raised by primitive operation at file "src/core/lwt.ml", line 1930, characters 23-26
flow: Called from file "src/core/lwt.ml", line 1214, characters 10-18
flow: Called from file "src/core/lwt.ml", line 1280, characters 17-21
flow: Called from file "src/core/lwt.ml", line 1316, characters 4-103
flow: Called from file "src/core/lwt.ml", line 1390, characters 37-76
flow: Called from file "src/core/lwt_sequence.ml", line 128, characters 31-47
flow: Called from file "src/unix/lwt_main.ml", line 25, characters 2-22
flow: Called from file "src/common/lwt/lwtInit.ml", line 129, characters 4-135
flow: Called from file "src/commands/commandUtils.ml", line 13, characters 4-32
flow: Called from file "src/flow.ml", line 109, characters 4-21

@rwols was also able to reproduce the issue and noted:

looks like flow doesn't like unicode-escaped characters like \ufffdabc. If I set ensure_ascii=False in the json encoder then it doesn't crash [...] but now it looks like it doesn't understand utf-16 column offsets. Oh well, I don't think many people will use utf-16 surrogate pairs in their code

He submitted a workaround at https://github.com/sublimelsp/LSP/pull/1701 which I'm very thankful for, but it sounds like something that can be better handled in Flow too!

mwiencek avatar May 29 '21 19:05 mwiencek

I tried this in python:

json.dumps({"text": "😃"})
'{"text": "\\ud83d\\ude03"}'

it seems that our JSON library, Hh_json, doesn't properly implement this part of the JSON RFC:

To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a twelve-character sequence,
encoding the UTF-16 surrogate pair.  So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E".

mroch avatar Dec 19 '22 14:12 mroch