monaco-editor
monaco-editor copied to clipboard
[Bug] `Range` does not treat surrogate pairs as a single column
Reproducible in vscode.dev or in VS Code Desktop?
- [X] Not reproducible in vscode.dev or VS Code Desktop
Reproducible in the monaco editor playground?
- [ ] Not reproducible in the monaco editor playground
Monaco Editor Playground Code
var text = [
'Latin text',
'نص عربي',
'𠀤𠀤中文文本',
].join('\n');
var editor = monaco.editor.create(document.getElementById('container'), {
value: text
});
var decorations = editor.deltaDecorations(
[],
[
{
range: new monaco.Range(1, 1, 1, 2),
options: { inlineClassName: 'redText' }
},
{
range: new monaco.Range(2, 1, 2, 2),
options: { inlineClassName: 'redText' }
},
{
range: new monaco.Range(3, 1, 3, 2),
options: { inlineClassName: 'redText' }
},
]
);
Actual Behavior
The character in the first column of line 3, 𠀤
, is not being rendered in red like the characters in first columns of the other lines, even if the redText
class is specified on the same range for all three lines.
Expected Behavior
The redText
class should be applied to the 𠀤
character, hence it should be rendered in red.
Additional Context
I'm fairly new to the monaco API, and after reading the Class Range
documentation I understood that the unit of a range, on the same line, is a column, which I suppose is also the unit of movement of the blinking cursor, which in turn (i guess) corresponds to a single rendered Unicode grapheme cluster.
So I would expect, given the same 1,2 range to all three lines, to see all their first characters (columns) in red.
My suspect is that Range
is not actually working with columns, but single UTF16 code units. For this reason the 𠀤
character, which is a surrogate pair (2 UTF16 code units), is not being included into the range.
Specifying 1-3
as range does in fact make the first character red. 1-4
does not make the second 𠀤
red, 1-5
does. 1-6
makes the third character 中
red, which is made of a single code unit instead.
Are my assumptions correct? Or is this the intended behavior and I am missing something from the docs?
My suspect is that Range is not actually working with columns, but single UTF16 code units.
This is correct.
Are my assumptions correct?
I think so.
So I guess the documentation should be updated then? I mean, isn't a bit confusing to call the unit of movement of Range
"column" and having it behaving differently from the text cursor?
We closed this issue because we don't plan to address it in the foreseeable future. If you disagree and feel that this issue is crucial: we are happy to listen and to reconsider.
If you wonder what we are up to, please see our roadmap and issue reporting guidelines.
Thanks for your understanding, and happy coding!