helix
helix copied to clipboard
Helix crashes when editing files with Russian text
Summary
Helix crashes when I edit a Markdown document that is part or all Cyrillic characters. An error is displayed in the terminal:
% hx README.md
thread 'main' panicked at 'byte index 421 is not a char boundary; it is inside 'З' (bytes 420..422) of `тся.
2. Когда началось её исполнение.
3. Во сколько её исполнение закончилось.
4. Какой код завершения она вернула.
2. Постепенное исполнение`[...]', helix-core/src/syntax.rs:1246:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
- System: ArchLinux
- Terminal: GNOME Console
LC_ALL:en_US.UTF-8, then setru_RU.UTF-8
Reproduction Steps
I tried this:
hxorRUST_BACKTRACE=1 hx -vv README.md- File with Cyrillic symbols.
- Crash!
I expected this to happen:
Instead, this happened:
% RUST_BACKTRACE=1 hx -vv README.md
thread 'main' panicked at 'byte index 421 is not a char boundary; it is inside 'З' (bytes 420..422) of `тся.
4. Когда началось её исполнение.
5. Во сколько её исполнение закончилось.
6. Какой код завершения она вернула.
2. Постепенное исполнение`[...]', helix-core/src/syntax.rs:1246:25
stack backtrace:
0: rust_begin_unwind
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
1: core::panicking::panic_fmt
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8d
[helix.log](https://github.com/helix-editor/helix/files/11675725/helix.log)
bc/library/core/src/panicking.rs:64:14
2: core::str::slice_error_fail_rt
3: core::str::slice_error_fail
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/str/mod.rs:86:9
4: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
5: tree_sitter::Parser::parse_with::read
6: ts_lexer_start
7: ts_parser__lex
8: ts_parser_parse
9: helix_core::syntax::LanguageLayer::parse
10: std::thread::local::LocalKey<T>::with
11: helix_core::syntax::Syntax::update
12: helix_view::document::Document::apply_impl
13: helix_view::document::Document::apply_inner
14: helix_term::commands::insert::insert_char
15: helix_term::ui::editor::EditorView::insert_mode
16: <helix_term::ui::editor::EditorView as helix_term::compositor::Component>::handle_event
17: helix_term::compositor::Compositor::handle_event
18: tokio::runtime::park::CachedParkThread::block_on
19: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
20: hx::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Helix log
~/.cache/helix/helix.log
2023-06-07T01:14:52.142 helix_view::editor [ERROR] Failed to initialize the LSP for `source.bash` { cannot find binary path }
2023-06-07T01:16:16.213 helix_view::editor [ERROR] editor error: Async job failed: request 2 timed out
2023-06-07T01:16:16.947 helix_view::editor [ERROR] editor error: Async job failed: request 3 timed out
2023-06-07T01:16:17.890 helix_view::editor [ERROR] editor error: Async job failed: request 4 timed out
2023-06-07T01:16:18.480 helix_view::editor [ERROR] editor error: Async job failed: request 5 timed out
2023-06-07T01:16:19.122 helix_view::editor [ERROR] editor error: Async job failed: request 6 timed out
2023-06-07T01:16:20.050 helix_view::editor [ERROR] editor error: Async job failed: request 7 timed out
2023-06-07T01:16:20.768 helix_lsp::transport [ERROR]
Platform
Linux
Terminal Emulator
GNOME Console 44.0
Helix Version
helix 23.05 (7f5940be)
Can you provide an example file where helix crashes (ideally with reproducible steps that lerad to a crash) .
This was also potentially fixed by #7417 and if not is likely duplicate of #6645 so if a reproduction case is not provided I will close this as stale
Open any Markdown file with Cyrillic characters, edit it and after some time Helix crashes. README.md
What edits are you making? Can you narrow down the reproduction steps? I can't reproduce with random changes
I was able to reprsuce this. The problem is that TS asks for the byte range of a node during incremental parsing. That byte range is somehow not aligned to char boundaries which causes a panic in rope.byte_slice. Ropey doesn't expose and API to directly iterate raw byte chunks so we need to wrap the ropey chunks iterator (using chunsk_at_byte which is the only ropey method which doesn't panic for non-char aligned bytes) ti be able to feed non-utf8 aligned chars to ropey.
This will also fix some other crashes (I think there are multiple open issues caused by this).
It's potentially/likely a grammar bug that some nodes are not aligned to char boundaries (but I am but sure). I think we should fix the osnic either way
I was able to get steps to segfault the parser: just keep on pasting абвгдежзийклмнопрстуфхцчшщъыьэюя in the beginning of the file until it got crashed. For some reason it only reproduces at last release at arch (i.e 23.10) but not at master (f992c3b5).
Not really sure it's the same issue as the stacktrace differs:
#0 0x00007f97ac07a70c in last_block (s=0x55dd2dbc6f50) at /usr/src/debug/helix/helix-23.10/runtime/grammars/sources/markdown/tree-sitter-markdown/src/scanner.c:236
#1 scan (s=0x55dd2dbc6f50, lexer=0x55dd2da859a0, valid_symbols=0x7f97ac0c90bc <ts_external_scanner_states+188>) at /usr/src/debug/helix/helix-23.10/runtime/grammars/sources/markdown/tree-sitter-markdown/src/scanner.c:1347
#2 0x000055dd2bb72313 in ts_parser__lex (parse_state=41, version=<optimized out>, self=0x55dd2da859a0) at src/./parser.c:427
#3 ts_parser__advance (allow_node_reuse=<optimized out>, version=0, self=0x55dd2da859a0) at src/./parser.c:1441
#4 ts_parser_parse (self=0x55dd2da859a0, old_tree=<optimized out>, input=...) at src/./parser.c:1933
#5 0x000055dd2b0efd66 in tree_sitter::Parser::parse_with<&[u8], helix_core::syntax::{impl#16}::parse::{closure_env#2}> (self=0x7f97acf95cf0, callback=0x7ffdcd978d00, old_tree=...) at /build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tree-sitter-0.20.10/binding_rust/lib.rs:537
#6 helix_core::syntax::LanguageLayer::parse (self=0x55dd2d8e12d8, parser=0x7f97acf95cf0, source=...) at helix-core/src/syntax.rs:1403
#7 helix_core::syntax::{impl#13}::update::{closure#1} (ts_parser=<optimized out>) at helix-core/src/syntax.rs:1125
#8 std::thread::local::LocalKey<core::cell::RefCell<helix_core::syntax::TsParser>>::try_with<core::cell::RefCell<helix_core::syntax::TsParser>, helix_core::syntax::{impl#13}::update::{closure_env#1}, core::result::Result<(), helix_core::syntax::Error>> (f=..., self=<optimized out>)
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/thread/local.rs:270
#9 std::thread::local::LocalKey<core::cell::RefCell<helix_core::syntax::TsParser>>::with<core::cell::RefCell<helix_core::syntax::TsParser>, helix_core::syntax::{impl#13}::update::{closure_env#1}, core::result::Result<(), helix_core::syntax::Error>> (f=...)
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/thread/local.rs:246
#10 helix_core::syntax::Syntax::update (self=0x55dd2daae1a0, old_source=..., source=..., changeset=<optimized out>) at helix-core/src/syntax.rs:1094
#11 0x000055dd2b778684 in helix_view::document::Document::apply_impl (self=0x55dd2daae000, transaction=<optimized out>, view_id=..., emit_lsp_notification=true) at helix-view/src/document.rs:1203
#12 0x000055dd2b779b74 in helix_view::document::Document::apply_inner (self=0x55dd2daae000, transaction=0x7ffdcd9795a0, view_id=..., emit_lsp_notification=true) at helix-view/src/document.rs:1296
#13 0x000055dd2b65dcf0 in helix_view::document::Document::apply (self=0x55dd2daae000, transaction=0x7ffdcd9795a0, view_id=...) at helix-view/src/document.rs:1308
#14 helix_term::commands::paste_impl (values=..., doc=0x55dd2daae000, view=0x55dd2dabb0a0, action=<optimized out>, count=1, mode=helix_view::document::Mode::Normal) at helix-term/src/commands.rs:4076
#15 0x000055dd2b65ea6a in helix_term::commands::paste (editor=0x7ffdcd97b378, register=<optimized out>, pos=<optimized out>, count=1) at helix-term/src/commands.rs:4153
Payload in parse_with
[111, 99, 115, 47, 96, 32, 208, 184, 32, 208, 178, 209, 139, 208, 191, 208, 190, 208, 187, 208, 189, 208, 184, 209, 130, 208, 181, 58, 10, 10, 96, 96, 96, 98, 97, 115, 104, 10, 109, 100, 98, 111, 111, 107, 32, 115, 101, 114, 118, 101, 32, 45, 45, 111, 112, 101, 110, 10, 96, 96, 96, 10, 10, 62, 32, 42, 42, 208, 146, 208, 189, 208, 184, 208, 188, 208, 176, 208, 189, 208, 184, 208, 181, 33, 42, 42, 10, 62, 10, 62, 32, 208, 148, 208, 187, 209, 143, 32, 209, 141, 209, 130, 208, 190, 208, 179, 208, 190, 32, 208, 178, 208, 176, 208, 188, 32, 208, 189, 208, 181, 208, 190, 208, 177, 209, 133, 208, 190, 208, 180, 208, 184, 208, 188, 32, 109, 100, 66, 111, 111, 107, 58, 10, 10, 96, 96, 96, 98, 97, 115, 104, 10, 99, 97, 114, 103, 111, 32, 105, 110, 115, 116, 97, 108, 108, 32, 109, 100, 98, 111, 111, 107, 10, 96, 96, 96, 10, 10, 35, 35, 35, 32, 208, 148, 208, 187, 209, 143, 32, 209, 128, 208, 176, 208, 183, 209, 128, 208, 176, 208, 177, 208, 190, 209, 130, 209, 135, 208, 184, 208, 186, 208, 190, 208, 178, 10, 10, 96, 96, 96, 98, 97, 115, 104, 10, 99, 97, 114, 103, 111, 32, 100, 111, 99, 32, 45, 45, 110, 111, 45, 100, 101, 112, 115, 32, 45, 45, 111, 112, 101, 110, 10, 96, 96, 96, 10]The failing function:
https://github.com/MDeiml/tree-sitter-markdown/blob/aaf76797aa8ecd9a5e78e0ec3681941de6c945ee/tree-sitter-markdown/src/scanner.c#L238-L240
with open_blocks simply empty: {size = 0, capacity = 0, items = 0x0}
UPD: sometimes it panics at ropey::byte_slice or similar
Ok, here even more detailed case that also reproduces on master and latest release:
- Put cursor in the beginning of the file
- Then paste
абвгдежзийклмнопрстуфхцчшщъыьэюя - Add new line
- Paste cyrillic line at (2) exactly four times
now it should panic with byte_slice(): Byte range does not align with char boundaries: range 2263..2457
stack trace
#6 0x0000556f0826041b in ropey::slice::RopeSlice::byte_slice<:ops::range::range>> (self=0x7ffd64a350e8, byte_range=...)
at /home/kitsu/.cargo/registry/src/github.com-1ecc6299db9ec823/ropey-1.6.1/src/slice.rs:703
#7 0x0000556f081f0470 in helix_core::syntax::HighlightConfiguration::injection_pair (self=0x556f0929e020, query_match=0x7ffd64a35910, source=...)
at helix-core/src/syntax.rs:1824
#8 0x0000556f081f0801 in helix_core::syntax::HighlightConfiguration::injection_for_match (self=0x556f0929e020, query=0x556f0929e0c0,
query_match=0x7ffd64a35910, source=...) at helix-core/src/syntax.rs:1856
#9 0x0000556f082713a6 in helix_core::syntax::{impl#13}::update::{closure#1} (ts_parser=0x7fa78b5f8cb0) at helix-core/src/syntax.rs:1148
#10 0x0000556f081f6b54 in std::thread::local::LocalKey<:cell::refcell>>::try_with<:cell::refcell>, helix_core::syntax::{impl#13}::update::{closure_env#1}, core::result::Result> (self=0x556f08cd52f8, f=...)
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/thread/local.rs:445
#11 0x0000556f081f5cbd in std::thread::local::LocalKey<:cell::refcell>>::with<:cell::refcell>, helix_core::syntax::{impl#13}::update::{closure_env#1}, core::result::Result> (self=0x556f08cd52f8, f=...)
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/thread/local.rs:421
#12 0x0000556f081ecaa7 in helix_core::syntax::Syntax::update (self=0x556f0942cf40, old_source=..., source=..., changeset=0x7ffd64a37390)
at helix-core/src/syntax.rs:1094
#13 0x0000556f07681e94 in helix_view::document::Document::apply_impl (self=0x556f0942ceb0, transaction=0x7ffd64a37390, view_id=..., emit_lsp_notification=true)
at helix-view/src/document.rs:1203
#14 0x0000556f0768289c in helix_view::document::Document::apply_inner (self=0x556f0942ceb0, transaction=0x7ffd64a37390, view_id=..., emit_lsp_notification=true)
at helix-view/src/document.rs:1296
#15 0x0000556f07682ab1 in helix_view::document::Document::apply (self=0x556f0942ceb0, transaction=0x7ffd64a37390, view_id=...)
at helix-view/src/document.rs:1308
#16 0x0000556f06b28c89 in helix_term::commands::paste_impl (values=..., doc=0x556f0942ceb0, view=0x556f093a1c90, action=helix_term::commands::Paste::Cursor,
count=1, mode=helix_view::document::Mode::Insert) at helix-term/src/commands.rs:4076
#17 0x0000556f06b290c7 in helix_term::commands::paste_bracketed_value (cx=0x7ffd64a37828, contents=...) at helix-term/src/commands.rs:4087
Note, it won't work if just paste the whole input at once.
Panics at: https://github.com/helix-editor/helix/blob/master/helix-core/src/syntax.rs#L1824
capture.node at injection_pair is '[QueryCapture { node: {Node code_fence_content (60, 0) - (74, 0)}, index: 0 }, QueryCapture { node: {Node code_fence_content (60, 0) - (74, 0)}, index: 1 }]
I got the same panic message when editting, but at difference line of source. Not yet reproducable, since I've lost some unsaved text..
- panic message
thread 'main' panicked at 'byte index 693 is not a char boundary; it is inside '偶' (bytes 692..695) of `。
- line of source
哈哈`[...]', /Users/a/.cargo/registry/src/github.com-1ecc
6299db9ec823/unicode-segmentation-1.10.1/src/grapheme.rs:553:22