pyrefly icon indicating copy to clipboard operation
pyrefly copied to clipboard

Uninitialised check wrong after `walrus` in `if`

Open ndmitchell opened this issue 5 months ago • 18 comments

UPDATE 2025/10/8: The example in the original issue report (preserved below) has been fixed, but Pyrefly still generates some related false positives.

Example 1:

import pathlib

def f(mod: str, stubs_path: pathlib.Path):
    _, *submods = mod.split(".")
    if (path := stubs_path.joinpath(*submods, "__init__.pyi")).is_file():
        return path
    assert submods, path  # false positive: `path` is uninitialized

sandbox

Example 2:

def check():
    if (y := 2) <= 1:
        return
    print(y)  # false positive: `y` is uninitialized

check()

sandbox

--- original issue report ---

Describe the Bug

from typing import *

def f(v):
    x: int
    if (x := v) and v:
        print(x) # E: `x` may be uninitialized

Sandbox Link

https://pyrefly.org/sandbox/?code=GYJw9gtgBALgngBwJYDsDmUkQWEMoBUAUEQCYCmwUwAFAG4CUAXEVG1AB5OYoyvtIqNDlCYBeKIygBDFKUkt2SqAhCoYwhkSA&version=3.12

(Only applicable for extension issues) IDE Information

No response

ndmitchell avatar Aug 13 '25 14:08 ndmitchell

Can I work on this?

IDrokin117 avatar Aug 14 '25 11:08 IDrokin117

assigned, thanks!

yangdanny97 avatar Aug 14 '25 11:08 yangdanny97

@yadngdanny97 I've been looking into this problem and would appreciate some help. Here are my findings.

Investigation results

I've prepared some examples with my own explanation (see below). It might be helpful.

In my opinion, the root cause of the bug is improper handling of NamedExpr during branch processing. At minimum, this affects Stmt::If, Expr::If, and Expr::BoolOp.

When Pyrefly processes branches, it creates multiple flows from the base, creates bindings (NarrowOps) for each, and then merges them with their styles. However, NamedExpr is handled incorrectly: the name information should be added not only to the current flow, but also to subsequent ones, or during branch merging, FlowInfo with NamedExpr should have higher priority. That is, FlowInfo::Uninitialized + FlowInfo::Other(NamedExpr) => FlowInfo::Other. See Example 1, Example 2, Example 6

Another issue (this might not be a problem - please let me know where I'm wrong) is that Pyrefly creates bindings for branches of BoolOp and flow information that we are guaranteed not to reach (or might not reach) at runtime. See Example 3, Example 4 and Example 5

Possible Solution

First of all, we need to determine whether NamedExpr will be reached in true and false paths. For this, we need to analyze the test field of Stmt::If following short-circuiting rules.

  1. If NamedExpr is always reachable, then I propose introducing a new value (for example) FlowStyle::Walrus, which would have priority over FlowStyle::Uninitialized and return FlowStyle::Walrus when merging.
  2. If NamedExpr is possibly reachable, then we left style FlowStyle::Other.
  3. Otherwise, do nothing.

Questions

  1. I understand that I don't see the full picture of Pyrefly, so I want to figure out where I might be wrong. Is my understanding of the process flow correct?
  2. Why is if.test processed entirely instead of stopping when a branch is 100% unreachable at runtime? (Example 3)
  3. Is it good idea to add FlowStyle::Walrus? Are there any alternatives to prioritize flow merging in another way?
  4. I tried making some local changes, but the current implementation doesn't allow achieving the desired result because AST tree processing is done recursively. When in the "Expr::Named" node, there's no way to find out what the parent node is. I also don't know if it's possible to execute ensure_expr, ignoring the creation of some bindings while processing others. In general, I'd like to understand your vision of potential implementation.
Examples

Example 1

Sandbox

def f(v):
    if value := v: 
        print(value) # flow 1
    else: 
        print(value) # flow 2  # OK, has default style `FlowStyle::Other`

Here everything works fine, because NarrowOps({Name("value"): ..., ...}) binds value to flow 2 and creates a new name in the flow with default FlowStyle::Other due to value missing in the scope.

Checks if name exists in the flow https://github.com/facebook/pyrefly/blob/0.29.0/pyrefly/lib/binding/scope.rs#L873-L876 Creates new FlowInfo for the name with default value Other https://github.com/facebook/pyrefly/blob/0.29.0/pyrefly/lib/binding/scope.rs#L312-L318

Example 2

Sandbox

def f(v):
    value: int # here value has style FlowInfo: Uninitialized
    if value := v: 
        print(value) # flow 1
    else: 
        print(value) # flow 2 # copy flow info from base flow e.g. FlowInfo: Uninitialized

Compared to Example 1, when value binds to flow 2 and updates the existing FlowInfo for name value. During the update, the previous style FlowStyle::Uninitialized is used.

Updates existed name in the flow (base flow) https://github.com/facebook/pyrefly/blob/0.29.0/pyrefly/lib/binding/scope.rs#L877-L880 Uses style from base flow https://github.com/facebook/pyrefly/blob/0.29.0/pyrefly/lib/binding/scope.rs#L324-L339

Example 3

Sandbox

def f(v):
    if True or (value := v):
        print(value) # expected "`value` is uninitialized", but Pyrefly doesn't show anything
    else:
        print(value)

Pyrefly doesn't implement short-circuiting for BoolOp operations. This means all value expressions within the BoolOp are processed, resulting in the creation of the value binding.

Example 4

Sandbox

def f(v):
    if v and (value := v):
        print(value)
    else:
        print(value) # expected "`value` may be uninitialized", but Pyrefly doesn't show anything

Same as in Example 3.

Example 5

def f(v):
    if False and (value := v):
        print(value)
    else:
        print(value)

Panic https://github.com/facebook/pyrefly/issues/962

Example 6

def f(v):
    x: int # here value has style FlowInfo: Uninitialized
    if (x := v) and v:
        print(x) # E: `x` may be uninitialized

Using the insights from the previous examples, here's what causes the original bug:

  1. Split the if test expression ((x := v) and v) into two branch flows: 1st flow (x := v) and 2nd flow v.
  2. When each branch is analyzed, it starts from the base flow, which has the x name with style FlowStyle::Uninitialized.
  3. The 1st flow creates bindings for x and v with FlowStyle::Other.
  4. The 2nd flow creates a binding for v with FlowStyle::Other and "inherits" the x name with style FlowStyle::Uninitialized from the base flow.
  5. The 1st and 2nd flows are merged; flow info of the x name uses rules https://github.com/facebook/pyrefly/blob/0.29.0/pyrefly/lib/binding/scope.rs#L267-L289, which produces FlowStyle::PossiblyUninitialized (FlowStyle::Other + FlowStyle::Uninitialized => FlowStyle::PossiblyUninitialized)
  6. As a result, the if branch has flow info for the x name with style FlowStyle::PossiblyUninitialized.

IDrokin117 avatar Aug 19 '25 15:08 IDrokin117

This issue has someone assigned, but has not had recent activity for more than 2 weeks.

If you are still working on this issue, please add a comment so everyone knows. Otherwise, please unassign yourself and allow someone else to take over.

Thank you for your contributions!

github-actions[bot] avatar Sep 04 '25 00:09 github-actions[bot]

@IDrokin117 thanks for taking the time to investigate this, and sorry it took so long to get back to you on this. I appreciate the detailed examples here.

Regarding your questions:

I understand that I don't see the full picture of Pyrefly, so I want to figure out where I might be wrong. Is my understanding of the process flow correct?

The key point I would add is that at the bindings step, where we determine whether or not something is initialized, we don't have access to any resolved types. This makes modeling short-circuiting behavior difficult, and any uninitialized variable analysis is by necessity conservative.

Why is if.test processed entirely instead of stopping when a branch is 100% unreachable at runtime? (Example 3)

We just don't have a sophisticated model of short-circuiting in boolean expressions during binding creation. I believe we do handle cases if the if.test is the True or False literal, for example if its False we don't emit errors in the body.

The reason why the existing handling is so simple is because in the binding layer we do not know about any types, so any short circuiting would have to be determined syntactically. For example, we could mark any walrus on the LHS of an and / or as always initialized, and on the RHS as possibly not initialized (unless the LHS is some truthy/falsy literal), but I think that would cause false positives.

It's worth noting that Pyright seems to do this: https://pyright-play.net/?code=CYUwZgBALiDOUEYAUBKAXAKAtiAPCAvBACoBOAriFjgJaT4CGAdsBEgJ4RpELrU45SIKOVJN%2BEAF6EI7DBlCQY8AEypMAuiQogIzVhy48%2BAgUJFiJ0onIXhocKAGZ1ErQDEGAG1i79bTm4IXg1TbHNRcQFrWSA

Is it good idea to add FlowStyle::Walrus? Are there any alternatives to prioritize flow merging in another way?

Flow merging as it is represented right now has a tendency to lose information in merges, since only one enum case can be represented at a time. We may want to make it a simple struct storing multiple flags instead, where each flag has its own merging rule.

I tried making some local changes, but the current implementation doesn't allow achieving the desired result because AST tree processing is done recursively. When in the "Expr::Named" node, there's no way to find out what the parent node is. I also don't know if it's possible to execute ensure_expr, ignoring the creation of some bindings while processing others. In general, I'd like to understand your vision of potential implementation.

I guess I'd split this into two parts:

  1. Is changing our representation of flow style and flow merging in a way that doesn't cause this false positive, which addresses this issue. After that, variables defined by walruses in guards would always be treated as initialized, which results in a conservative analysis that gives no false positives but some false negatives.
  2. Is improving our modeling of when a walrus is/is not always initialized. As mentioned earlier, I don't think we can model this precisely at the bindings step so even if we do the implementation below it would be able to handle literal True/False in a compound condition but not much else.

To handle compound guards with literal true/false, we could generate multiple flows for each part of the BoolOp.

Given if v and (x := y):

  • in the False branch we would have one flow where v is False and x is never initialized, and another where v is True but y is False, so x does get initialized
  • in the True branch x would always be initialized

If we do this we should be careful to merge the flows created during the guard immediately after processing the guard, to avoid creating an exponential number of flows.

This would allow us to short-circuit cases of False and x := ... and True or x := ... and detect x as being uninitialized. How we handle the other cases would depend on whether we want to have false positives (treat x on the RHS as possibly-uninitialized) or false negatives (treat x on the RHS as initialized).

Does that make sense? We can discuss it live on discord if that would help.

yangdanny97 avatar Sep 06 '25 04:09 yangdanny97

This issue has someone assigned, but has not had recent activity for more than 2 weeks.

If you are still working on this issue, please add a comment so everyone knows. Otherwise, please unassign yourself and allow someone else to take over.

Thank you for your contributions!

github-actions[bot] avatar Sep 26 '25 00:09 github-actions[bot]

I just ran into a similar issue with the following code:

_, *submods = mod.split(".")
if (path := STUBS_PATH.joinpath(*submods, "__init__.pyi")).is_file():
    return path
assert submods, path  # ❌ 
ERROR `path` is uninitialized [unbound-name]
  --> scripts/unstubbed_modules.py:65:21
   |
65 |     assert submods, path
   |                     ^^^^
   |
 INFO 1 error

But since there's no and and/or or here, maybe this one has a different underlying cause?

jorenham avatar Oct 04 '25 00:10 jorenham

I see this three times in one of my commits!

stop is marked uninitialized here.

output is marked uninitialized here. There's no and or or involved anywhere.

w is marked uninitialized here.

Guess I've learned to love the walrus.

rec avatar Oct 08 '25 10:10 rec

Guess I've learned to love the walrus.

And I'm sure the walrus loves you too

jorenham avatar Oct 08 '25 14:10 jorenham

Was this just fixed? This trips for me locally (pyrefly 0.36.1) on this snippet:

def hello(x: int, y: int) -> int | None:
    if x == 5 and (z := x + y) == 7:
        return z
ERROR `z` may be uninitialized [unbound-name]
 --> test_repo/__init__.py:3:16
  |
3 |         return z
  |                ^
  |
 INFO 1 error

but the sandbox doesn't trigger this issue? Actually now that I look the OP's sandbox link also doesn't trigger the error, so maybe I misunderstand how to use the sandbox.

rachtsingh avatar Oct 08 '25 15:10 rachtsingh

I have a similar issue with walrus inside if-statements that is still reproducible:

def check():
    if (y := 2) <= 1:
        return
    print(y)

check()

https://pyrefly.org/sandbox/?project=N4IgZglgNgpgziAXKOBDAdgEwEYHsAeAdAA4CeSImMYABAMYAWMdA1gBQCUiAOujfzQi02pGogC8NAEwcaAHkkBGHnwFqATjAAuAV3Xpea4uojotIjr16Nm7S%2BhAAaEGU1gopQltwBbKBQBiGgAFUjcPGjQsPHx6XHRIAHM9VC0IeMJeIIBlGBgaBi0tYjhEAHoy12oPQlx1RLKYdDLMXDo4Mrp4pJS0%2BLKaMDqaVAA3VGhUbFg4hIhk9VT0vlxiPvQ4TPQyLQZ4gFpRmHU4ZZpJbhAAZkJFKUveEABfZ1Q6NKOAMWgYCiicAgkchPIA

5j9 avatar Oct 08 '25 15:10 5j9

Hey folks! Thanks for the bug reports. Some of these issues were fixed in https://github.com/facebook/pyrefly/issues/1251, which has been pushed to the sandbox but not released to PyPI. I'll cut a patch release, which will be out in a few hours.

rchen152 avatar Oct 08 '25 17:10 rchen152

Pyrefly 0.36.2 is out now. Any of the above issues that did not reproduce in the sandbox should be fixed in this release.

rchen152 avatar Oct 08 '25 19:10 rchen152

That seems to have done the trick for scipy-stubs: https://github.com/scipy/scipy-stubs/pull/932/files#diff-b831ede36f15de78332d4b7ad3239cb432074d18d8ce48394aca4f4ab98712f1

Thanks!

jorenham avatar Oct 08 '25 19:10 jorenham

@IDrokin117 The bug involved actually already has a unit test, see test_walrus_on_first_branch_of_if in https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/test/flow.rs

It's a longstanding bug but is worse because I improved uninitialized local checks, so now we're more sensitive to bugs in flow merging because they show up as false positives; I encountered it just this week but didn't have time to work on a fix.

The fix is going to need to happen in the If statement handler of https://github.com/facebook/pyrefly/blob/main/pyrefly/binding/stmt.rs

You can actually probably copy the approach I used in 6f61942e20b9ec2a22c54116bd61bc28d93d3ff1, which fixed essentially the same problem for boolean operations... in both bool ops and if/else chains, the first test always evaluates, which has implications for walrus.

What's the actual problem? Generally wrong flows in if/else branching

We aren't modeling this correctly because we're branching off of the base flow in each branch, evaluating the test in the forked branch, and moving on. What actually happens is that the very first test evaluates in the base flow, and later tests evaluate in a "test flow"; in other words to handle

if A:
    body_a
else if B:
    body_b
else:
    body_c

what we do today is take base_flow

  • fork it to flow_a and evaluate + apply narrows from A then body_b
  • fork it to flow_b and apply narrows negation(A) then evaluate and apply narrows for B then evaluate body_b
  • fork it to flow_c and apply narrows negation(A) && negation(B) then evaluate body_c.

What ideally would happen instead is a bit more subtle:

  • Take the base flow and evaluate A in it
  • Fork the base flow to a new flow test
  • Fork test to flow_a and apply narrows from A then evaluate body_a
  • Apply negation(A) to test and evaluate B
  • Fork test to flow_b and apply narrows from B, then evaluate body_b
  • Apply negation(B) to test
  • Evaluate body_c in test (we don't have to fork it because this is branch is the last one; in a non-exhaustive if/else we would want to)

A quicker partial fix

The changes above are needed to actually get everything right - for example, a walrus in a middle branch is available when the next branch test runs, as in

if condition():
   ...
else if (b := foo() or condition()):
   ...
else if f(b):  # Pyrefly will falsely say uninitialized local here
  ...

But the vast majority of the bug reports are coming from just the simpler scenario where a walrus happens in the very first test. As a result if we can fix just that one test so that it evaluates in the base flow, a lot of folks will be unblocked.

That quick fix fix will be slightly more complicated than 6f61942e20b9ec2a22c54116bd61bc28d93d3ff1 because we have to apply this idea in a loop instead of in a case with just two branches, but it's exactly the same concept.i

stroxler avatar Oct 09 '25 16:10 stroxler

This issue has someone assigned, but has not had recent activity for more than 2 weeks.

If you are still working on this issue, please add a comment so everyone knows. Otherwise, please unassign yourself and allow someone else to take over.

Thank you for your contributions!

github-actions[bot] avatar Nov 05 '25 00:11 github-actions[bot]

I'm not sure if this helps, but here's a simple example that still shows the issue on 0.40.1:

        if not self.title and (title := tags.get("title")):
            self.title = title.strip()
ERROR `title` may be uninitialized [unbound-name]
   --> audiobooks/metadata/metadata_parsing.py:125:26
    |
125 |             self.title = title.strip()
    |                          ^^^^^
    |

pdc1 avatar Nov 09 '25 22:11 pdc1

This issue has someone assigned, but has not had recent activity for more than 2 weeks.

If you are still working on this issue, please add a comment so everyone knows. Otherwise, please unassign yourself and allow someone else to take over.

Thank you for your contributions!

github-actions[bot] avatar Nov 25 '25 00:11 github-actions[bot]