EditorSyntax icon indicating copy to clipboard operation
EditorSyntax copied to clipboard

stop parsing symbol only ends at end of line, but pipe character is also acceptable

Open msftrncs opened this issue 7 years ago • 6 comments

Environment

  • Editor and Version: VS Code: 1.26.1
  • Your primary theme: Monokai Dimmed

Issue Description

the stop parsing symbol, --% is set to scope all the way to the end of the line. I thought this was how the symbol worked as well, until I reread the doc on the matter (about_Parsing) and instead, it also can be terminated by the pipe character, but only if the pipe would be outside of any double-quoted constructs, as I have determined.

Its also possible to use environment variable substitution using the CMD `%variable% syntax, but if your environment variable name contains a double-quote, PowerShell processes it first, before the substitution has occurred (if it even occurs, as just like in CMD, if the variable is not found, the substitution does not occur), so its actually impossible to determine a real variable reference.

Expected Behavior

Syntax should at least support stop-parsing symbol's scope ending at a pipe, in the same manner as PowerShell actually does.

Possible tmLanguage modification:

{
	"begin": "(?<!\\w)(--%)(?!\\w)",
	"beginCaptures": {
		"1": {
			"name": "keyword.control.powershell"
		}
	},
	"end": "$|\\|",
	"patterns": [
		{
			"match": "[^\"\\x{201C}\\x{201D}]+?",
			"name": "string.unquoted.powershell"
		},
		{
			"begin": "(?:\"|\\x{201C}|\\x{201D})",
			"beginCaptures": {
				"0": {
					"name": "punctuation.definition.string.begin.powershell"
				}
			},
			"end": "(?:\"|\\x{201C}|\\x{201D})(?!\"|\\x{201C}|\\x{201D})|$",
			"endCaptures": {
				"0": {
					"name": "punctuation.definition.string.end.powershell"
				}
			},
			"name": "string.quoted.double.powershell"
		}
	],
	"comment": "This should be moved to the repository at some point."
},

I have the Unicode double-quotes included, as I determined that PowerShell treats them the same here as well as elsewhere.

I do have the intention of putting this in a PR.

msftrncs avatar Aug 27 '18 05:08 msftrncs

Can someone explain, in the regex sample shown above, which works, why the lazy .+ that was in the first pattern can keep from matching | but cannot prevent itself from matching the quotes, hence why they had to be in a negative character class which replaced the .+. I immediately made it lazy, .+? and it immediately worked at finding the pipe in the end match, but wouldn't find the quotes in the next pattern's match, because the first match would swallow them up. How does it know not to swallow the pipe? However, if it wasn't lazy, it would swallow the pipe in to that match.

msftrncs avatar Aug 27 '18 05:08 msftrncs

Can someone explain, in the regex sample shown above, which works, why the lazy .+ that was in the first pattern can keep from matching | but cannot prevent itself from matching the quotes, hence why they had to be in a negative character class which replaced the .+.

I don't see a .+ in your original post. Where were you seeing the issue?

omniomi avatar Aug 27 '18 13:08 omniomi

I don't see a .+ in your original post. Where were you seeing the issue?

It was originally in the match, in place of the negative class that now includes the quotes, match": "[^\"\\x{201C}\\x{201D}]+?. It was just .+, I changed it to .+? and it worked for the end to still catch the pipe, but I would have expected it to capture the pipe … kind of … that's why I added the ?. I take it that the engine it actually working both match sets at the same time, but not each of the patterns at the same time … possibly as each pattern is tested, the end match is concatenated on as an optional ending?

msftrncs avatar Aug 28 '18 01:08 msftrncs

I would likely structure that rule so the begin matches on --% and the end is either EOL or | with everything in between string.unquoted.powershell which will give the most consistent result:

{
	"begin": "(?<!\\w)(--%)(?!\\w)",
	"beginCaptures": {
		"1": {
			"name": "keyword.control.powershell"
		}
	},
	"patterns": [
		{
			"match": ".+?",
			"name": "string.unquoted.powershell"
		}
	],
	"end": "(?=\\||$)"
}
<dict>
	<key>begin</key>
	<string>(?&lt;!\w)(--%)(?!\w)</string>
	<key>beginCaptures</key>
	<dict>
		<key>1</key>
		<dict>
			<key>name</key>
			<string>keyword.control.powershell</string>
		</dict>
	</dict>
	<key>end</key>
	<string>(?=\||$)</string>
	<key>patterns</key>
	<array>
		<dict>
			<key>match</key>
			<string>.+?</string>
			<key>name</key>
			<string>string.unquoted.powershell</string>
		</dict>
	</array>
</dict>

This produces the expected result:

stopproc

For reason's I don't understand using a non-capture group ((?:)) in the "end" position still clobbers the characters specified when you have a broad match in the "patterns" section. Using a positive lookahead ((?=)) make the end of the capture occur just before the specified "end" pattern thus prevents clobbering.

omniomi avatar Aug 28 '18 13:08 omniomi

By the way, I added a detailed guide on how to contribute: https://github.com/PowerShell/EditorSyntax/blob/master/CONTRIBUTING.md#contributing-guide

omniomi avatar Aug 28 '18 14:08 omniomi

@msftrncs you should consider contributing 😊 I think you could be a great help to the project and all the users that use it!

TylerLeonhardt avatar Aug 28 '18 14:08 TylerLeonhardt