vscode-textmate
vscode-textmate copied to clipboard
multiply applied capture groups seems to ignore some captures
a bit of an edge case, I'm not sure how this is supposed to be handled -- I don't have a concrete use case, just trying to implement my own parser in python using this as a reference
sample grammar
{
"scopeName": "test",
"patterns": [
{
"match": "((a)) ((b) c) (d (e)) ((f) )",
"name": "matched",
"captures": {
"1": {"name": "g1"},
"2": {"name": "g2"},
"3": {"name": "g3"},
"4": {"name": "g4"},
"5": {"name": "g5"},
"6": {"name": "g6"},
"7": {
"patterns": [
{"match": "f", "name": "g7f"},
{"match": " ", "name": "g7space"}
]
},
"8": {"name": "g8"}
}
}
]
}
sample file
a b c d e f z
tokenization using vs code
$ node vsc.js cap.json f
Tokenizing line: a b c d e f z
- token from 0 to 1 (a) with scopes test, matched, g1, g2
- token from 1 to 2 ( ) with scopes test, matched
- token from 2 to 3 (b) with scopes test, matched, g3, g4
- token from 3 to 5 ( c) with scopes test, matched, g3
- token from 5 to 6 ( ) with scopes test, matched
- token from 6 to 8 (d ) with scopes test, matched, g5
- token from 8 to 9 (e) with scopes test, matched, g5, g6
- token from 9 to 10 ( ) with scopes test, matched
- token from 10 to 11 (f) with scopes test, matched, g7f
- token from 11 to 12 ( ) with scopes test, matched, g7space
- token from 12 to 14 (z) with scopes test
I expect the f to have the scope test, matched, g7f, g8:
>>> # ...
>>> state, regions = highlight_line(compiler, state, 'a b c d e f z', first_line=True)
>>> import pprint
>>> pprint.pprint(regions)
(Region(start=0, end=1, scope=('test', 'matched', 'g1', 'g2')),
Region(start=1, end=2, scope=('test', 'matched')),
Region(start=2, end=3, scope=('test', 'matched', 'g3', 'g4')),
Region(start=3, end=5, scope=('test', 'matched', 'g3')),
Region(start=5, end=6, scope=('test', 'matched')),
Region(start=6, end=8, scope=('test', 'matched', 'g5')),
Region(start=8, end=9, scope=('test', 'matched', 'g5', 'g6')),
Region(start=9, end=10, scope=('test', 'matched')),
Region(start=10, end=11, scope=('test', 'matched', 'g7f', 'g8')),
Region(start=11, end=12, scope=('test', 'matched', 'g7space')),
Region(start=12, end=13, scope=('test',)))
I have tried also in TextMate and they appear to handle this in the way you expect:

Here is the grammar converted to TextMate's format:
{ patterns = (
{
match = "((a)) ((b) c) (d (e)) ((f) )";
name = "matched";
captures = {
1 = { name = "g1"; };
2 = { name = "g2"; };
3 = { name = "g3"; };
4 = { name = "g4"; };
5 = { name = "g5"; };
6 = { name = "g6"; };
7 = {
patterns = (
{ match = "f"; name = "g7f"; },
{ match = " "; name = "g7space"; },
);
};
8 = { name = "g8"; };
};
},
);
}