PSScriptAnalyzer icon indicating copy to clipboard operation
PSScriptAnalyzer copied to clipboard

No 'Variable Referenced Before Assignment' Rule?

Open Vacant0mens opened this issue 3 years ago • 10 comments

From what I can tell, there's no linting rule to check if a variable is null or unassigned when it's being referenced. This would be super helpful to avoid nullref errors when running a script or module. I know there's a "variable assigned but not referenced/used" rule, but that's kind of the opposite of what I'm looking for.

I know this functionality exists in pylint for python, but I sorely miss this when working with PowerShell.

Does this exist? If so, how does one enable it?

Apologies if there is already an issue for this somewhere. I couldn't find one.

Vacant0mens avatar Apr 08 '21 21:04 Vacant0mens

Thanks @Vacant0mens that is a feature of strict mode in PowerShell which can be a (runtime) workaround-- because of dynamic scoping in PowerShell this is impossible to always get right at the script analysis level-- this issue is further described in https://github.com/PowerShell/PSScriptAnalyzer/issues/1641

SydneyhSmith avatar Apr 20 '21 22:04 SydneyhSmith

If it can't be caught as a "referenced before assignment" situation, is there a way to have the linter point out variables that are potentially null within the scope of the script itself? or is that covered under the same case that you mentioned?

Vacant0mens avatar Apr 20 '21 22:04 Vacant0mens

So just to expand on this a bit, since we discussed it.

PowerShell uses dynamic scope. That means that a variable's value depends on the runtime stack. If you show me a scriptblock or a script that references a variable that isn't assigned anywhere in the script and ask whether it's defined, the answer is "depends how you call it at runtime". And because you can do anything at runtime, like make calls or variable definitions dependent on the output of a web request, there's really no way to say statically.

As an explicit note, this is a departure from Python (for example), which uses lexical scope. If you don't assign an x in your function, that reference will look up the lexical scope stack at parse time. If it doesn't exist lexically, that's a parse error. If it does exist, a closure is created around that function and the external variables it references. PowerShell doesn't do that.

Does that mean we're out of luck? Perhaps not if we change the parameters of the problem. Perhaps we can just look for variables that can only be defined by virtue of dynamic scope — ones that haven't been assigned in the immediate scope. However, naively applying that rule is going to turn up a lot of results that users will see as false positives even though they're not. Consider:

$extension = '*.txt'
Get-ChildItem | Where-Object { $_.Extension -like $extension }

Where-Object takes a scriptblock and implicitly depends on dynamic scope to provide the values for things like $extension. Lots of people write functionality like this without ever understanding that they depend on dynamic scope, because when you define a variable and reference it immediately in a child scope, dynamic and lexical scope behave identically.

So we could build in some "obvious false positive" logic. For that PSScriptAnalyzer must understand that Where-Object is invoking the scriptblock it's given immediately in the scope where $extension is defined in order to not flag it. Same with ForEach-Object and several other cmdlets.

So we could build a simple heuristic that says "for this command name, any scriptblock passed to it should be considered to be executed in scope". And in fact for the converse problem (the unassigned variable problem), we also want to know when scriptblocks are dot-sourced (since that propagates assignments upward), so we also need logic to understand that. Starting to get a bit more complicated.

But then there's things like Invoke-Command, which when given a scriptblock, execute it synchronously in a child scope. Unless certain parameters like -ComputerName, -HostName or -Session are provided, in which case the variable actually references a state in another runspace... So we start to need a way to generally infer parameter binding, which is also (1) not generally decidable statically and (2) quite complex even without taking runtime state into account.

And we haven't yet discussed things like Set-Variable -Name x or Set-Item 'variable:x', which can manipulate variable state dynamically and which also need some concept of parameter binding to figure out.

Basically, to embark on this a little bit is going to annoy people and create the expectation of being correct, when really it's just a heuristic. To annoy people less (but bear in mind it's still a heuristic), we'd need to do a lot of yak shaving.

Of course, we'd accept a contribution that implemented it. But it's not a workitem that we're likely to tackle internally any time soon.

rjmholt avatar Apr 21 '21 00:04 rjmholt

Okay. I'm getting the problem a bit more now. Thanks for that.

So I guess what I'm really asking for is something like "Warning: in the current script context/file, this variable is used/referenced, but in the lines above it there are no $var ='s nor set-variable's." (i.e. you misspelled the variable, or forgot to give it a value.) Kind of just the first layer of "obviously something's missing" syntactically.

Unless I'm misunderstanding how all this works? I have seen "you're overriding a system variable" warnings, so it seems possible with my limited, new knowledge.

Vacant0mens avatar Apr 21 '21 01:04 Vacant0mens

Yeah it's possible. It should be done on a per-scriptblock basis (the AST visitor should build a stack of dictionaries of variables as it visits each scriptblock down a lexical scope). But the scriptblocks for scriptblock cmdlets will create false positives that people will complain about, so we need to be smarter than that.

rjmholt avatar Apr 21 '21 01:04 rjmholt

Oh okay. And that's what the other issue was getting into, right? I think I'm seeing it now.

So PSScriptAnalyzer doesn't technically do syntax parsing on the text itself as text? It actually attempts to load the script file in as a script block first? Or did I miss something? It kind of sounds like cmdlet functions, modules, and other related items (things that don't execute within current context) will need to be handled differently than normal scripts/scriptblocks, but basic scripts/scriptblocks would be a good place to start. (Differently, meaning in a more advanced way, but based in similar parameters) Read: it seems like breaking it up into stages/phases would be pretty easy.

At least for VSCode, I was just hoping there'd be a parser or something to do it within the text itself, rather than checking through the whole call stack or context.

Vacant0mens avatar Apr 21 '21 02:04 Vacant0mens

So PSScriptAnalyzer doesn't technically do syntax parsing on the text itself as text?

PSScriptAnalyzer takes the script text and uses the Parser API to get the AST and tokens corresponding to that text. Those outputs are simply structured representations of the text as the PowerShell parser sees them -- they don't contain any extra or special information.

So at that point, we have an AST and we'd need to sift through each scriptblock in it (scriptblocks essentially correspond to scopes) and build a table of variables. You can see an example of that here:

https://github.com/PowerShell/PSScriptAnalyzer/blob/58c44234d44dfd0db35bb532906963e08fde8621/Rules/UseDeclaredVarsMoreThanAssignments.cs#L117-L158

But I stress that this is just a syntactic representation, we have to do the hard work of understanding the semantics. So for example, that linked rule makes a bunch of assumptions that aren't true in a number of cases, which is what https://github.com/PowerShell/PSScriptAnalyzer/issues/1641 is all about.

rjmholt avatar Apr 21 '21 04:04 rjmholt

You might find this blog post to be a helpful introduction on how PSSA works

rjmholt avatar Apr 21 '21 05:04 rjmholt

Okay, now that I mostly understand what's going on (I think), is there a rule in the works that's basically the inverse of the rule you referenced? (i.e. check variable references against the table created with the AST's) Or does that rule need to be created still?

By the way, thank you for taking the time to help me understand this.

Vacant0mens avatar Apr 21 '21 07:04 Vacant0mens

There's no such rule today, for the reasons I describe above.

rjmholt avatar Apr 21 '21 09:04 rjmholt