vscode-textmate icon indicating copy to clipboard operation
vscode-textmate copied to clipboard

Subroutines breaking capture tokenizing inside of referenced capture group

Open RedCMD opened this issue 4 years ago • 3 comments

When trying to call a subroutine on a capture group via \\g<1>. The call will remove all the previous tokens from capture groups that aren't rechecked in the subroutine.

Create a syntax highlighting extension with this code

{
	"$schema": "https://raw.githubusercontent.com/martinring/tmlanguage/master/tmlanguage.json",
	"name": "Subroutines Syntax",
	"scopeName": "source.redcmd.syntax.subroutines",
	"patterns": [
		{ "include": "#subroutines" }
	],
	"repository": {
		"subroutines": {
			"match": "((a)|(b)|(c)|(d))-\\g<1>",
			"captures": {
				"2": { "name": "strong variable.other.constant" },
				"3": { "name": "strong keyword.control" },
				"4": { "name": "strong support.type" },
				"5": { "name": "strong constant.character.escape" }
			}
		}
	}
}

image

Expected outcome is that it will highlight all text in the format [abcd]-[abcd]

a-a
a-b
a-c
a-d
b-a
b-b
b-c
b-d
c-a
c-b
c-c
c-d
d-a
d-b
d-c
d-d

Like so: image

But instead all tokens connected to capture groups that don't get rematched against (and fail) in the subroutine call get purged. (capture groups 2 to 5) image

RedCMD avatar Jan 02 '22 12:01 RedCMD

Another way to see it, is to create a highlighter like this: image

"match": "(A)(B)(C)(D)(E)(F)(G)(H)(I)(J)\\g<6>?(K)(L)(M)(N)(O)(P)",
"captures": {
	"1":  { "name": "markup.underline invalid" },
	"2":  { "name": "markup.underline string.regexp" },
	"3":  { "name": "markup.underline string" },
	"4":  { "name": "markup.underline constant.character.escape" },
	"5":  { "name": "markup.underline support.function" },
	"6":  { "name": "markup.underline constant.numeric" },
	"7":  { "name": "markup.underline comment" },
	"8":  { "name": "markup.underline support.type" },
	"9":  { "name": "markup.underline variable" },
	"10": { "name": "markup.underline variable.other.constant" },
	"11": { "name": "markup.underline keyword" },
	"12": { "name": "markup.underline punctuation.definition.list.begin.markdown" },
	"13": { "name": "markup.underline header" },
	"14": { "name": "markup.underline constant.regexp" },
	"15": { "name": "markup.underline keyword.control" },
	"16": { "name": "markup.underline punctuation.definition.tag" }
}

and a test file with: ABCDEFGHIJKLMNOP It should then colour the letters like so: image This does not trigger the subroutine \\g<6> (which is optional) and thus works fine

But if you insert a F inbetween J and K, the call will be made and will break all tokenization ((F)(G)(H)(I)(J)) between (F) (group 6) and the caller \\g<6> image

This is extremely annoying when you have to copy and paste large amounts of the same regex over and over again instead of just being able to make a recall to the code. and you cant just set the code off at the side and never have it run. The subroutine call will still be able to manage to break itself.

RedCMD avatar Jan 13 '22 03:01 RedCMD

https://github.com/microsoft/vscode-textmate/issues/208

RedCMD avatar Nov 05 '23 07:11 RedCMD