stop parsing symbol only ends at end of line, but pipe character is also acceptable
Environment
- Editor and Version: VS Code: 1.26.1
- Your primary theme: Monokai Dimmed
Issue Description
the stop parsing symbol, --% is set to scope all the way to the end of the line. I thought this was how the symbol worked as well, until I reread the doc on the matter (about_Parsing) and instead, it also can be terminated by the pipe character, but only if the pipe would be outside of any double-quoted constructs, as I have determined.
Its also possible to use environment variable substitution using the CMD `%variable% syntax, but if your environment variable name contains a double-quote, PowerShell processes it first, before the substitution has occurred (if it even occurs, as just like in CMD, if the variable is not found, the substitution does not occur), so its actually impossible to determine a real variable reference.
Expected Behavior
Syntax should at least support stop-parsing symbol's scope ending at a pipe, in the same manner as PowerShell actually does.
Possible tmLanguage modification:
{
"begin": "(?<!\\w)(--%)(?!\\w)",
"beginCaptures": {
"1": {
"name": "keyword.control.powershell"
}
},
"end": "$|\\|",
"patterns": [
{
"match": "[^\"\\x{201C}\\x{201D}]+?",
"name": "string.unquoted.powershell"
},
{
"begin": "(?:\"|\\x{201C}|\\x{201D})",
"beginCaptures": {
"0": {
"name": "punctuation.definition.string.begin.powershell"
}
},
"end": "(?:\"|\\x{201C}|\\x{201D})(?!\"|\\x{201C}|\\x{201D})|$",
"endCaptures": {
"0": {
"name": "punctuation.definition.string.end.powershell"
}
},
"name": "string.quoted.double.powershell"
}
],
"comment": "This should be moved to the repository at some point."
},
I have the Unicode double-quotes included, as I determined that PowerShell treats them the same here as well as elsewhere.
I do have the intention of putting this in a PR.
Can someone explain, in the regex sample shown above, which works, why the lazy .+ that was in the first pattern can keep from matching | but cannot prevent itself from matching the quotes, hence why they had to be in a negative character class which replaced the .+. I immediately made it lazy, .+? and it immediately worked at finding the pipe in the end match, but wouldn't find the quotes in the next pattern's match, because the first match would swallow them up. How does it know not to swallow the pipe? However, if it wasn't lazy, it would swallow the pipe in to that match.
Can someone explain, in the regex sample shown above, which works, why the lazy .+ that was in the first pattern can keep from matching | but cannot prevent itself from matching the quotes, hence why they had to be in a negative character class which replaced the .+.
I don't see a .+ in your original post. Where were you seeing the issue?
I don't see a .+ in your original post. Where were you seeing the issue?
It was originally in the match, in place of the negative class that now includes the quotes, match": "[^\"\\x{201C}\\x{201D}]+?. It was just .+, I changed it to .+? and it worked for the end to still catch the pipe, but I would have expected it to capture the pipe … kind of … that's why I added the ?. I take it that the engine it actually working both match sets at the same time, but not each of the patterns at the same time … possibly as each pattern is tested, the end match is concatenated on as an optional ending?
I would likely structure that rule so the begin matches on --% and the end is either EOL or | with everything in between string.unquoted.powershell which will give the most consistent result:
{
"begin": "(?<!\\w)(--%)(?!\\w)",
"beginCaptures": {
"1": {
"name": "keyword.control.powershell"
}
},
"patterns": [
{
"match": ".+?",
"name": "string.unquoted.powershell"
}
],
"end": "(?=\\||$)"
}
<dict>
<key>begin</key>
<string>(?<!\w)(--%)(?!\w)</string>
<key>beginCaptures</key>
<dict>
<key>1</key>
<dict>
<key>name</key>
<string>keyword.control.powershell</string>
</dict>
</dict>
<key>end</key>
<string>(?=\||$)</string>
<key>patterns</key>
<array>
<dict>
<key>match</key>
<string>.+?</string>
<key>name</key>
<string>string.unquoted.powershell</string>
</dict>
</array>
</dict>
This produces the expected result:

For reason's I don't understand using a non-capture group ((?:)) in the "end" position still clobbers the characters specified when you have a broad match in the "patterns" section. Using a positive lookahead ((?=)) make the end of the capture occur just before the specified "end" pattern thus prevents clobbering.
By the way, I added a detailed guide on how to contribute: https://github.com/PowerShell/EditorSyntax/blob/master/CONTRIBUTING.md#contributing-guide
@msftrncs you should consider contributing 😊 I think you could be a great help to the project and all the users that use it!