astroid
astroid copied to clipboard
Support "is None" constraints from if statements during inference
Steps
- [x] For new features or bug fixes, add a ChangeLog entry describing what your PR does.
- [x] Write a good description on what the PR does.
Description
Alright, this is a big one so please don't hold back. This PR adds infrastructure to support constraints on values during inference. Here's a simple example:
from astroid import extract_node
node = extract_node("""
def f(b):
if b:
x = 1
else:
x = None
if x is not None:
x #@
""")
print(node.inferred())
This prints [<Const.int l.4 at 0x14232a6e6d0>, <Const.NoneType l.6 at 0x142329b38b0>], using both possible values from the assignment statements. Of course, because of the x is not None if condition, only the int is possible, but astroid currently doesn't take this condition into account.
This PR changes how inference works so that only [<Const.int l.4 at 0x14232a6e6d0>] is printed.
Scope
I reviewed the existing pylint issues related to this (they are generally labelled as "topic-control-flow"). There were five types of constraints that were brought up:
x is None/x is not Nonex/not x(truthy/falsey value)isinstance(x, ...)x == ...- Combinations of the above with
and/or
There were also six primary locations where these constraints could be inferred (orthogonal to the constraints themselves):
-
if statement (as in above example)
-
and/orclauses. For example,x is not None and x[0] == 1; the secondxshould not be inferred asNone -
ternary if expressions. For example,
x[0] if x is not None else 1; the firstxshould not be inferred asNone -
reassignment within an if statement, e.g.
def f(): a = None if a is None: a = 1 return -a -
early return/raises within an if statement.
-
assert statements
This pull request addresses just the item 1's in each list. I used an approach that can handle 1-5 of the constraint types, and 1-3 of the locations. If this approach is approved, I think it would be pretty straightforward to make additional PRs for these items. (More on the other constraint locations below.)
Implementation overview
I added a new Constraint class to represent a constraint on a variable, like x is not None.
- This is an abstract class, and can be subclassed for different constraint types.
When a variable is inferred (see inference.py, infer_name), we search the ancestor chain from the node and look for containing if statements, using these to accumulate constraints on the variable. These constraints are added to the inference context (added a new attribute to InferenceContext).
Then, in _infer_stmts (bases.py), we filter the inferred values based on whether or not they satisfy all constraints for that variable.
Uninferableis considered to satisfy all constraints.- ⚠️ If there is at least one inferred value, but all inferred values are filtered out, a new
Uninferablevalue is yielded, rather than yielding nothing at all. I think this is the conservative approach, to avoid errors from theraise_if_nothing_inferreddecorator.
The above also applies to attributes (see infer_attribute).
Some comments/questions
- Because constraints are gathered by searching up the ancestor chain, this PR doesn't handle real control flow issues, e.g. with the reassignment example from above.
- Because the filtering occurs in
_infer_stmts, it only applies to variables and attributes. So for example, subscript nodes (e.g.my_array[0]) won't have their inferred values filtered. - Gathering constraints occurs every time
infer_name/infer_attributeis called, and since this requires going up the ancestor tree, this adds some overhead to inference. In principle constraints can be gathered through an AST visitor, but I thought that would be more complex so didn't choose it. (Thoughts welcome of course!) - I think adding the constraints to
InferenceContextis correct, but to be honest I'm fuzzy on the nuances of this class and how it's used. - Do you like the name "constraint" here? I go back on forth on it, but haven't come up with anything better.
Type of Changes
| Type | |
|---|---|
| ✓ | :sparkles: New feature |
Related Issue
Closes with test #791 Closes with test PyCQA/pylint#157 Closes with test PyCQA/pylint#1472 Closes with test PyCQA/pylint#2016 Closes with test PyCQA/pylint#2631 Closes with test PyCQA/pylint#2880
Hi @david-yz-liu, thanks for all of the work here. I think the size (and importance) of this PR has put off people from reviewing this but I think we should try and start the process of getting this merged.
The first question I would have before doing a full review is how we could expand this to include conditional reassignments. I think it would be every important to support the following code pattern:
def func():
if True:
to_infer = 1
else:
to_infer = 2
print(to_infer)
This does not need to be added in this PR, but we should at least have a general idea of how we could implement this within the same system. Do/did you have any ideas about this yourself?
@DanielNoord wow, thank you! So sorry for the delay, I think I saw your message and meant to reply months ago but then... forgot. My bad!
Your question is a great one. It's been a while since I've thought about this, but I'll try to give a decent answer now (and perhaps a more thorough one later). I believe my current approach piggybacks off of the way InferenceContexts are passed around when doing inference, so the general rule for what this PR can or can't do is "we can accumulate Contraints whenever we can accumulate other things in InferenceContexts". The examples I gave in the PR description are places where a context is passed from a parent down to a child node in the AST, so they're pretty straightforward.
However as far I as can remember, existing InferenceContexts don't get passed across statements in a block. I think this is fundamentally why pylint's VariablesChecker is as complex as it is?
But anyway to answer your question, I think that this PR can be extended to handle the example you provided, but it wouldn't be by adding to my code very much, but rather by changing how these InferenceContexts (or some other "context" state) is passed around, using techniques similar to what's happening in the VariablesChecker. Or, using control flow graphs (😅) to precisely model the possible execution paths, and then accumulating Contraints across paths in the graph.
Please let me know if that answers your question, or if there are any follow-ups! Happy to be of service :)
@DanielNoord wow, thank you! So sorry for the delay, I think I saw your message and meant to reply months ago but then... forgot. My bad!
Your question is a great one. It's been a while since I've thought about this, but I'll try to give a decent answer now (and perhaps a more thorough one later). I believe my current approach piggybacks off of the way
InferenceContexts are passed around when doing inference, so the general rule for what this PR can or can't do is "we can accumulate Contraints whenever we can accumulate other things inInferenceContexts". The examples I gave in the PR description are places where a context is passed from a parent down to a child node in the AST, so they're pretty straightforward.However as far I as can remember, existing
InferenceContextsdon't get passed across statements in a block. I think this is fundamentally why pylint'sVariablesCheckeris as complex as it is?But anyway to answer your question, I think that this PR can be extended to handle the example you provided, but it wouldn't be by adding to my code very much, but rather by changing how these
InferenceContexts (or some other "context" state) is passed around, using techniques similar to what's happening in theVariablesChecker. Or, using control flow graphs (😅) to precisely model the possible execution paths, and then accumulating Contraints across paths in the graph.Please let me know if that answers your question, or if there are any follow-ups! Happy to be of service :)
I think that answers it pretty clearly. As you can see I spent some time familiarising myself with the code 😄 I think this could indeed be a very good first step towards some better control flow. I have been thinking about doing something with frames and line numbering to do some more difficult constraints, but at the moment I don't see how we couldn't change the current approach to do that. It all seems flexible enough.
Let me know what you think of my changes. I made the tests pass on 3.7 and made some of the typing a little more strict. I appreciate that you had the constraints allow for other nodes than If, but for ease of later expansion I think it makes sense to keep it tighter now and expand the typing if needed later on.
Let me know what you think of my changes. I made the tests pass on 3.7 and made some of the typing a little more strict. I appreciate that you had the constraints allow for other nodes than
If, but for ease of later expansion I think it makes sense to keep it tighter now and expand the typing if needed later on.
Will do. I'll try to look at this more closely tonight or tomorrow. 👍
Pull Request Test Coverage Report for Build 3586640247
- 92 of 92 (100.0%) changed or added relevant lines in 4 files are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage increased (+0.06%) to 92.365%
| Totals | |
|---|---|
| Change from base Build 3586270619: | 0.06% |
| Covered Lines: | 9980 |
| Relevant Lines: | 10805 |
💛 - Coveralls
@DanielNoord Okay I reviewed all of your changes, they look good to me! Yeah I was going for more general types but happy to stay strict for now and expand later if needed. Also thanks for making that Python 3.7 fix, that would have taken me a while to track down 😅.
Anyway as far as I can tell, this is good from my end, just waiting for more detailed review/feedback.
@cdce8p Do you still want to review this before we merge this? I think with my last comments addressed this is almost good to go!
@cdce8p Do you still want to review this before we merge this? I think with my last comments addressed this is almost good to go!
If you're ok with waiting a bit longer, I could take a look in about 2-3 weeks. Will have much more time then to catch up on everything.
@cdce8p Do you still want to review this before we merge this? I think with my last comments addressed this is almost good to go!
If you're ok with waiting a bit longer, I could take a look in about 2-3 weeks. Will have much more time then to catch up on everything.
Fine with me! Should be around the time that I finish with argparse so I'll have time to assist with anything that comes up then 😄
@DanielNoord Sounds good to me!
Yikes, I'm sorry I totally missed the last review from months ago! 😞
I'm happy to make all the requested changes later today, shouldn't take long.
Sorry for the commit spam, I'm currently only running 3.10 locally. If this continues I'll fix it properly...
Thanks for the review @cdce8p! I'll make these updates in the next couple of days for sure.
Hello! I apologize for the delay again in making these updates (just have had a lot of other things going on). Notably, I've moved the constraint code back into a single file (constraint.py) and removed the __all__ and instead used the underscore prefix where appropriate.
I handled the review by Marc and incorporate his comments to help move this PR along. I think we're all eager to get this merged 😄
Thanks @DanielNoord and @cdce8p! I wasn't sure exactly when I'd have time to look at this again so very much appreciate the updates. 👍
BTW on the question of whether multiple constraint classes may match a given node (in the future), I think... maybe but not necessarily. For example, if we imagine a hypotheical "AndConstraint" from
if x is not None and isinstance(x, int):
... x ...
and so the if node's test is an and of two expressions, I could see either yielding two distinct constraints ("not none constraint" and "is int instance" constraint), or a single AndConstraint whose children are the two constraints. (Personally I'd prefer going with the latter approach.)
BTW on the question of whether multiple constraint classes may match a given node (in the future), I think... maybe but not necessarily. For example, if we imagine a hypotheical "
AndConstraint" fromif x is not None and isinstance(x, int): ... x ...and so the if node's
testis anandof two expressions, I could see either yielding two distinct constraints ("not none constraint" and "is int instance" constraint), or a singleAndConstraintwhose children are the two constraints. (Personally I'd prefer going with the latter approach.)
I think the concept of AndConstraint makes sense. This would remove the need for the yield behaviour.
I implemented the yielding behaviour for now but that could easily be reverted. @cdce8p what do you think?
Codecov Report
Merging #1189 (46c056f) into main (f476ebc) will increase coverage by
0.05%. The diff coverage is100.00%.
Additional details and impacted files
@@ Coverage Diff @@
## main #1189 +/- ##
==========================================
+ Coverage 92.58% 92.63% +0.05%
==========================================
Files 93 94 +1
Lines 10782 10869 +87
==========================================
+ Hits 9982 10069 +87
Misses 800 800
| Flag | Coverage Δ | |
|---|---|---|
| linux | 92.40% <100.00%> (+0.06%) |
:arrow_up: |
| pypy | 88.54% <96.73%> (+0.05%) |
:arrow_up: |
| windows | 92.31% <100.00%> (+0.06%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| astroid/bases.py | 87.98% <100.00%> (+0.38%) |
:arrow_up: |
| astroid/constraint.py | 100.00% <100.00%> (ø) |
|
| astroid/context.py | 97.33% <100.00%> (+0.11%) |
:arrow_up: |
| astroid/inference.py | 94.54% <100.00%> (+0.03%) |
:arrow_up: |
Thanks to all for reviews and persistence--huge step forward!
💯 @Pierre-Sassoulas thank you! And thank you all for the efforts in reviewing this and keeping the branch up to date. I'm super excited to see how this evolves!