Uninitialised check wrong after `walrus` in `if`
UPDATE 2025/10/8: The example in the original issue report (preserved below) has been fixed, but Pyrefly still generates some related false positives.
Example 1:
import pathlib
def f(mod: str, stubs_path: pathlib.Path):
_, *submods = mod.split(".")
if (path := stubs_path.joinpath(*submods, "__init__.pyi")).is_file():
return path
assert submods, path # false positive: `path` is uninitialized
Example 2:
def check():
if (y := 2) <= 1:
return
print(y) # false positive: `y` is uninitialized
check()
--- original issue report ---
Describe the Bug
from typing import *
def f(v):
x: int
if (x := v) and v:
print(x) # E: `x` may be uninitialized
Sandbox Link
https://pyrefly.org/sandbox/?code=GYJw9gtgBALgngBwJYDsDmUkQWEMoBUAUEQCYCmwUwAFAG4CUAXEVG1AB5OYoyvtIqNDlCYBeKIygBDFKUkt2SqAhCoYwhkSA&version=3.12
(Only applicable for extension issues) IDE Information
No response
Can I work on this?
assigned, thanks!
@yadngdanny97 I've been looking into this problem and would appreciate some help. Here are my findings.
Investigation results
I've prepared some examples with my own explanation (see below). It might be helpful.
In my opinion, the root cause of the bug is improper handling of NamedExpr during branch processing. At minimum, this affects Stmt::If, Expr::If, and Expr::BoolOp.
When Pyrefly processes branches, it creates multiple flows from the base, creates bindings (NarrowOps) for each, and then merges them with their styles. However, NamedExpr is handled incorrectly: the name information should be added not only to the current flow, but also to subsequent ones, or during branch merging, FlowInfo with NamedExpr should have higher priority. That is, FlowInfo::Uninitialized + FlowInfo::Other(NamedExpr) => FlowInfo::Other.
See Example 1, Example 2, Example 6
Another issue (this might not be a problem - please let me know where I'm wrong) is that Pyrefly creates bindings for branches of BoolOp and flow information that we are guaranteed not to reach (or might not reach) at runtime. See Example 3, Example 4 and Example 5
Possible Solution
First of all, we need to determine whether NamedExpr will be reached in true and false paths. For this, we need to analyze the test field of Stmt::If following short-circuiting rules.
- If
NamedExpris always reachable, then I propose introducing a new value (for example)FlowStyle::Walrus, which would have priority overFlowStyle::Uninitializedand returnFlowStyle::Walruswhen merging. - If
NamedExpris possibly reachable, then we left styleFlowStyle::Other. - Otherwise, do nothing.
Questions
- I understand that I don't see the full picture of Pyrefly, so I want to figure out where I might be wrong. Is my understanding of the process flow correct?
- Why is
if.testprocessed entirely instead of stopping when a branch is 100% unreachable at runtime? (Example 3) - Is it good idea to add
FlowStyle::Walrus? Are there any alternatives to prioritize flow merging in another way? - I tried making some local changes, but the current implementation doesn't allow achieving the desired result because AST tree processing is done recursively. When in the "Expr::Named" node, there's no way to find out what the parent node is. I also don't know if it's possible to execute
ensure_expr, ignoring the creation of some bindings while processing others. In general, I'd like to understand your vision of potential implementation.
Examples
Example 1
def f(v):
if value := v:
print(value) # flow 1
else:
print(value) # flow 2 # OK, has default style `FlowStyle::Other`
Here everything works fine, because NarrowOps({Name("value"): ..., ...}) binds value to flow 2 and creates a new name in the flow with default FlowStyle::Other due to value missing in the scope.
Checks if name exists in the flow https://github.com/facebook/pyrefly/blob/0.29.0/pyrefly/lib/binding/scope.rs#L873-L876
Creates new FlowInfo for the name with default value Other https://github.com/facebook/pyrefly/blob/0.29.0/pyrefly/lib/binding/scope.rs#L312-L318
Example 2
def f(v):
value: int # here value has style FlowInfo: Uninitialized
if value := v:
print(value) # flow 1
else:
print(value) # flow 2 # copy flow info from base flow e.g. FlowInfo: Uninitialized
Compared to Example 1, when value binds to flow 2 and updates the existing FlowInfo for name value. During the update, the previous style FlowStyle::Uninitialized is used.
Updates existed name in the flow (base flow) https://github.com/facebook/pyrefly/blob/0.29.0/pyrefly/lib/binding/scope.rs#L877-L880 Uses style from base flow https://github.com/facebook/pyrefly/blob/0.29.0/pyrefly/lib/binding/scope.rs#L324-L339
Example 3
def f(v):
if True or (value := v):
print(value) # expected "`value` is uninitialized", but Pyrefly doesn't show anything
else:
print(value)
Pyrefly doesn't implement short-circuiting for BoolOp operations. This means all value expressions within the BoolOp are processed, resulting in the creation of the value binding.
Example 4
def f(v):
if v and (value := v):
print(value)
else:
print(value) # expected "`value` may be uninitialized", but Pyrefly doesn't show anything
Same as in Example 3.
Example 5
def f(v):
if False and (value := v):
print(value)
else:
print(value)
Panic https://github.com/facebook/pyrefly/issues/962
Example 6
def f(v):
x: int # here value has style FlowInfo: Uninitialized
if (x := v) and v:
print(x) # E: `x` may be uninitialized
Using the insights from the previous examples, here's what causes the original bug:
- Split the if test expression (
(x := v) and v) into two branch flows: 1st flow(x := v)and 2nd flowv. - When each branch is analyzed, it starts from the base flow, which has the
xname with styleFlowStyle::Uninitialized. - The 1st flow creates bindings for
xandvwithFlowStyle::Other. - The 2nd flow creates a binding for
vwithFlowStyle::Otherand "inherits" thexname with styleFlowStyle::Uninitializedfrom the base flow. - The 1st and 2nd flows are merged; flow info of the
xname uses rules https://github.com/facebook/pyrefly/blob/0.29.0/pyrefly/lib/binding/scope.rs#L267-L289, which producesFlowStyle::PossiblyUninitialized(FlowStyle::Other+FlowStyle::Uninitialized=>FlowStyle::PossiblyUninitialized) - As a result, the if branch has flow info for the
xname with styleFlowStyle::PossiblyUninitialized.
This issue has someone assigned, but has not had recent activity for more than 2 weeks.
If you are still working on this issue, please add a comment so everyone knows. Otherwise, please unassign yourself and allow someone else to take over.
Thank you for your contributions!
@IDrokin117 thanks for taking the time to investigate this, and sorry it took so long to get back to you on this. I appreciate the detailed examples here.
Regarding your questions:
I understand that I don't see the full picture of Pyrefly, so I want to figure out where I might be wrong. Is my understanding of the process flow correct?
The key point I would add is that at the bindings step, where we determine whether or not something is initialized, we don't have access to any resolved types. This makes modeling short-circuiting behavior difficult, and any uninitialized variable analysis is by necessity conservative.
Why is if.test processed entirely instead of stopping when a branch is 100% unreachable at runtime? (Example 3)
We just don't have a sophisticated model of short-circuiting in boolean expressions during binding creation. I believe we do handle cases if the if.test is the True or False literal, for example if its False we don't emit errors in the body.
The reason why the existing handling is so simple is because in the binding layer we do not know about any types, so any short circuiting would have to be determined syntactically. For example, we could mark any walrus on the LHS of an and / or as always initialized, and on the RHS as possibly not initialized (unless the LHS is some truthy/falsy literal), but I think that would cause false positives.
It's worth noting that Pyright seems to do this: https://pyright-play.net/?code=CYUwZgBALiDOUEYAUBKAXAKAtiAPCAvBACoBOAriFjgJaT4CGAdsBEgJ4RpELrU45SIKOVJN%2BEAF6EI7DBlCQY8AEypMAuiQogIzVhy48%2BAgUJFiJ0onIXhocKAGZ1ErQDEGAG1i79bTm4IXg1TbHNRcQFrWSA
Is it good idea to add FlowStyle::Walrus? Are there any alternatives to prioritize flow merging in another way?
Flow merging as it is represented right now has a tendency to lose information in merges, since only one enum case can be represented at a time. We may want to make it a simple struct storing multiple flags instead, where each flag has its own merging rule.
I tried making some local changes, but the current implementation doesn't allow achieving the desired result because AST tree processing is done recursively. When in the "Expr::Named" node, there's no way to find out what the parent node is. I also don't know if it's possible to execute ensure_expr, ignoring the creation of some bindings while processing others. In general, I'd like to understand your vision of potential implementation.
I guess I'd split this into two parts:
- Is changing our representation of flow style and flow merging in a way that doesn't cause this false positive, which addresses this issue. After that, variables defined by walruses in guards would always be treated as initialized, which results in a conservative analysis that gives no false positives but some false negatives.
- Is improving our modeling of when a walrus is/is not always initialized. As mentioned earlier, I don't think we can model this precisely at the bindings step so even if we do the implementation below it would be able to handle literal True/False in a compound condition but not much else.
To handle compound guards with literal true/false, we could generate multiple flows for each part of the BoolOp.
Given if v and (x := y):
- in the False branch we would have one flow where v is False and x is never initialized, and another where v is True but y is False, so x does get initialized
- in the True branch x would always be initialized
If we do this we should be careful to merge the flows created during the guard immediately after processing the guard, to avoid creating an exponential number of flows.
This would allow us to short-circuit cases of False and x := ... and True or x := ... and detect x as being uninitialized. How we handle the other cases would depend on whether we want to have false positives (treat x on the RHS as possibly-uninitialized) or false negatives (treat x on the RHS as initialized).
Does that make sense? We can discuss it live on discord if that would help.
This issue has someone assigned, but has not had recent activity for more than 2 weeks.
If you are still working on this issue, please add a comment so everyone knows. Otherwise, please unassign yourself and allow someone else to take over.
Thank you for your contributions!
I just ran into a similar issue with the following code:
_, *submods = mod.split(".")
if (path := STUBS_PATH.joinpath(*submods, "__init__.pyi")).is_file():
return path
assert submods, path # ❌
ERROR `path` is uninitialized [unbound-name]
--> scripts/unstubbed_modules.py:65:21
|
65 | assert submods, path
| ^^^^
|
INFO 1 error
But since there's no and and/or or here, maybe this one has a different underlying cause?
I see this three times in one of my commits!
stop is marked uninitialized here.
output is marked uninitialized here. There's no and or or involved anywhere.
w is marked uninitialized here.
Guess I've learned to love the walrus.
Guess I've learned to love the walrus.
And I'm sure the walrus loves you too
Was this just fixed? This trips for me locally (pyrefly 0.36.1) on this snippet:
def hello(x: int, y: int) -> int | None:
if x == 5 and (z := x + y) == 7:
return z
ERROR `z` may be uninitialized [unbound-name]
--> test_repo/__init__.py:3:16
|
3 | return z
| ^
|
INFO 1 error
but the sandbox doesn't trigger this issue? Actually now that I look the OP's sandbox link also doesn't trigger the error, so maybe I misunderstand how to use the sandbox.
I have a similar issue with walrus inside if-statements that is still reproducible:
def check():
if (y := 2) <= 1:
return
print(y)
check()
https://pyrefly.org/sandbox/?project=N4IgZglgNgpgziAXKOBDAdgEwEYHsAeAdAA4CeSImMYABAMYAWMdA1gBQCUiAOujfzQi02pGogC8NAEwcaAHkkBGHnwFqATjAAuAV3Xpea4uojotIjr16Nm7S%2BhAAaEGU1gopQltwBbKBQBiGgAFUjcPGjQsPHx6XHRIAHM9VC0IeMJeIIBlGBgaBi0tYjhEAHoy12oPQlx1RLKYdDLMXDo4Mrp4pJS0%2BLKaMDqaVAA3VGhUbFg4hIhk9VT0vlxiPvQ4TPQyLQZ4gFpRmHU4ZZpJbhAAZkJFKUveEABfZ1Q6NKOAMWgYCiicAgkchPIA
Hey folks! Thanks for the bug reports. Some of these issues were fixed in https://github.com/facebook/pyrefly/issues/1251, which has been pushed to the sandbox but not released to PyPI. I'll cut a patch release, which will be out in a few hours.
Pyrefly 0.36.2 is out now. Any of the above issues that did not reproduce in the sandbox should be fixed in this release.
That seems to have done the trick for scipy-stubs: https://github.com/scipy/scipy-stubs/pull/932/files#diff-b831ede36f15de78332d4b7ad3239cb432074d18d8ce48394aca4f4ab98712f1
Thanks!
@IDrokin117 The bug involved actually already has a unit test, see test_walrus_on_first_branch_of_if in https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/test/flow.rs
It's a longstanding bug but is worse because I improved uninitialized local checks, so now we're more sensitive to bugs in flow merging because they show up as false positives; I encountered it just this week but didn't have time to work on a fix.
The fix is going to need to happen in the If statement handler of https://github.com/facebook/pyrefly/blob/main/pyrefly/binding/stmt.rs
You can actually probably copy the approach I used in 6f61942e20b9ec2a22c54116bd61bc28d93d3ff1, which fixed essentially the same problem for boolean operations... in both bool ops and if/else chains, the first test always evaluates, which has implications for walrus.
What's the actual problem? Generally wrong flows in if/else branching
We aren't modeling this correctly because we're branching off of the base flow in each branch, evaluating the test in the forked branch, and moving on. What actually happens is that the very first test evaluates in the base flow, and later tests evaluate in a "test flow"; in other words to handle
if A:
body_a
else if B:
body_b
else:
body_c
what we do today is take base_flow
- fork it to
flow_aand evaluate + apply narrows fromAthenbody_b - fork it to
flow_band apply narrowsnegation(A)then evaluate and apply narrows forBthen evaluatebody_b - fork it to
flow_cand apply narrowsnegation(A) && negation(B)then evaluatebody_c.
What ideally would happen instead is a bit more subtle:
- Take the base flow and evaluate
Ain it - Fork the base flow to a new flow
test - Fork
testtoflow_aand apply narrows fromAthen evaluatebody_a - Apply
negation(A)totestand evaluateB - Fork
testtoflow_band apply narrows fromB, then evaluatebody_b - Apply
negation(B)totest - Evaluate
body_cintest(we don't have to fork it because this is branch is the last one; in a non-exhaustiveif/elsewe would want to)
A quicker partial fix
The changes above are needed to actually get everything right - for example, a walrus in a middle branch is available when the next branch test runs, as in
if condition():
...
else if (b := foo() or condition()):
...
else if f(b): # Pyrefly will falsely say uninitialized local here
...
But the vast majority of the bug reports are coming from just the simpler scenario where a walrus happens in the very first test. As a result if we can fix just that one test so that it evaluates in the base flow, a lot of folks will be unblocked.
That quick fix fix will be slightly more complicated than 6f61942e20b9ec2a22c54116bd61bc28d93d3ff1 because we have to apply this idea in a loop instead of in a case with just two branches, but it's exactly the same concept.i
This issue has someone assigned, but has not had recent activity for more than 2 weeks.
If you are still working on this issue, please add a comment so everyone knows. Otherwise, please unassign yourself and allow someone else to take over.
Thank you for your contributions!
I'm not sure if this helps, but here's a simple example that still shows the issue on 0.40.1:
if not self.title and (title := tags.get("title")):
self.title = title.strip()
ERROR `title` may be uninitialized [unbound-name]
--> audiobooks/metadata/metadata_parsing.py:125:26
|
125 | self.title = title.strip()
| ^^^^^
|
This issue has someone assigned, but has not had recent activity for more than 2 weeks.
If you are still working on this issue, please add a comment so everyone knows. Otherwise, please unassign yourself and allow someone else to take over.
Thank you for your contributions!