monaco-editor icon indicating copy to clipboard operation
monaco-editor copied to clipboard

[Bug] `Range` does not treat surrogate pairs as a single column

Open debevv opened this issue 2 years ago • 2 comments

Reproducible in vscode.dev or in VS Code Desktop?

  • [X] Not reproducible in vscode.dev or VS Code Desktop

Reproducible in the monaco editor playground?

Monaco Editor Playground Code

var text = [
    'Latin text',
    'نص عربي',
    '𠀤𠀤中文文本',
].join('\n');

var editor = monaco.editor.create(document.getElementById('container'), {
    value: text
});

var decorations = editor.deltaDecorations(
	[],
	[
		{
			range: new monaco.Range(1, 1, 1, 2),
			options: { inlineClassName: 'redText' }
		},
        {
			range: new monaco.Range(2, 1, 2, 2),
			options: { inlineClassName: 'redText' }
		},
        {
			range: new monaco.Range(3, 1, 3, 2),
			options: { inlineClassName: 'redText' }
		},
	]
);

Actual Behavior

The character in the first column of line 3, 𠀤, is not being rendered in red like the characters in first columns of the other lines, even if the redText class is specified on the same range for all three lines. Screenshot from 2022-06-08 15-17-05

Expected Behavior

The redText class should be applied to the 𠀤 character, hence it should be rendered in red.

Additional Context

I'm fairly new to the monaco API, and after reading the Class Range documentation I understood that the unit of a range, on the same line, is a column, which I suppose is also the unit of movement of the blinking cursor, which in turn (i guess) corresponds to a single rendered Unicode grapheme cluster. So I would expect, given the same 1,2 range to all three lines, to see all their first characters (columns) in red.

My suspect is that Range is not actually working with columns, but single UTF16 code units. For this reason the 𠀤 character, which is a surrogate pair (2 UTF16 code units), is not being included into the range.

Specifying 1-3 as range does in fact make the first character red. 1-4 does not make the second 𠀤 red, 1-5 does. 1-6 makes the third character red, which is made of a single code unit instead.

Are my assumptions correct? Or is this the intended behavior and I am missing something from the docs?

debevv avatar Jun 08 '22 13:06 debevv

My suspect is that Range is not actually working with columns, but single UTF16 code units.

This is correct.

Are my assumptions correct?

I think so.

hediet avatar Jul 19 '22 13:07 hediet

So I guess the documentation should be updated then? I mean, isn't a bit confusing to call the unit of movement of Range "column" and having it behaving differently from the text cursor?

debevv avatar Jul 25 '22 14:07 debevv

We closed this issue because we don't plan to address it in the foreseeable future. If you disagree and feel that this issue is crucial: we are happy to listen and to reconsider.

If you wonder what we are up to, please see our roadmap and issue reporting guidelines.

Thanks for your understanding, and happy coding!

hediet avatar Mar 13 '23 11:03 hediet