astroid icon indicating copy to clipboard operation
astroid copied to clipboard

Support "is None" constraints from if statements during inference

Open david-yz-liu opened this issue 4 years ago • 14 comments

Steps

  • [x] For new features or bug fixes, add a ChangeLog entry describing what your PR does.
  • [x] Write a good description on what the PR does.

Description

Alright, this is a big one so please don't hold back. This PR adds infrastructure to support constraints on values during inference. Here's a simple example:

from astroid import extract_node

node = extract_node("""
def f(b):
    if b:
        x = 1
    else:
        x = None
    
    if x is not None:
        x #@
""")
print(node.inferred())

This prints [<Const.int l.4 at 0x14232a6e6d0>, <Const.NoneType l.6 at 0x142329b38b0>], using both possible values from the assignment statements. Of course, because of the x is not None if condition, only the int is possible, but astroid currently doesn't take this condition into account.

This PR changes how inference works so that only [<Const.int l.4 at 0x14232a6e6d0>] is printed.

Scope

I reviewed the existing pylint issues related to this (they are generally labelled as "topic-control-flow"). There were five types of constraints that were brought up:

  1. x is None/x is not None
  2. x/not x (truthy/falsey value)
  3. isinstance(x, ...)
  4. x == ...
  5. Combinations of the above with and/or

There were also six primary locations where these constraints could be inferred (orthogonal to the constraints themselves):

  1. if statement (as in above example)

  2. and/or clauses. For example, x is not None and x[0] == 1; the second x should not be inferred as None

  3. ternary if expressions. For example, x[0] if x is not None else 1; the first x should not be inferred as None

  4. reassignment within an if statement, e.g.

    def f():
        a = None
        if a is None:
            a = 1
        return -a
    
  5. early return/raises within an if statement.

  6. assert statements

This pull request addresses just the item 1's in each list. I used an approach that can handle 1-5 of the constraint types, and 1-3 of the locations. If this approach is approved, I think it would be pretty straightforward to make additional PRs for these items. (More on the other constraint locations below.)

Implementation overview

I added a new Constraint class to represent a constraint on a variable, like x is not None.

  • This is an abstract class, and can be subclassed for different constraint types.

When a variable is inferred (see inference.py, infer_name), we search the ancestor chain from the node and look for containing if statements, using these to accumulate constraints on the variable. These constraints are added to the inference context (added a new attribute to InferenceContext).

Then, in _infer_stmts (bases.py), we filter the inferred values based on whether or not they satisfy all constraints for that variable.

  • Uninferable is considered to satisfy all constraints.
  • ⚠️ If there is at least one inferred value, but all inferred values are filtered out, a new Uninferable value is yielded, rather than yielding nothing at all. I think this is the conservative approach, to avoid errors from the raise_if_nothing_inferred decorator.

The above also applies to attributes (see infer_attribute).

Some comments/questions

  1. Because constraints are gathered by searching up the ancestor chain, this PR doesn't handle real control flow issues, e.g. with the reassignment example from above.
  2. Because the filtering occurs in _infer_stmts, it only applies to variables and attributes. So for example, subscript nodes (e.g. my_array[0]) won't have their inferred values filtered.
  3. Gathering constraints occurs every time infer_name/infer_attribute is called, and since this requires going up the ancestor tree, this adds some overhead to inference. In principle constraints can be gathered through an AST visitor, but I thought that would be more complex so didn't choose it. (Thoughts welcome of course!)
  4. I think adding the constraints to InferenceContext is correct, but to be honest I'm fuzzy on the nuances of this class and how it's used.
  5. Do you like the name "constraint" here? I go back on forth on it, but haven't come up with anything better.

Type of Changes

Type
:sparkles: New feature

Related Issue

Closes with test #791 Closes with test PyCQA/pylint#157 Closes with test PyCQA/pylint#1472 Closes with test PyCQA/pylint#2016 Closes with test PyCQA/pylint#2631 Closes with test PyCQA/pylint#2880

david-yz-liu avatar Sep 24 '21 02:09 david-yz-liu

Hi @david-yz-liu, thanks for all of the work here. I think the size (and importance) of this PR has put off people from reviewing this but I think we should try and start the process of getting this merged.

The first question I would have before doing a full review is how we could expand this to include conditional reassignments. I think it would be every important to support the following code pattern:

def func():
    if True:
        to_infer = 1
    else:
        to_infer = 2

    print(to_infer)

This does not need to be added in this PR, but we should at least have a general idea of how we could implement this within the same system. Do/did you have any ideas about this yourself?

DanielNoord avatar Dec 19 '21 19:12 DanielNoord

@DanielNoord wow, thank you! So sorry for the delay, I think I saw your message and meant to reply months ago but then... forgot. My bad!

Your question is a great one. It's been a while since I've thought about this, but I'll try to give a decent answer now (and perhaps a more thorough one later). I believe my current approach piggybacks off of the way InferenceContexts are passed around when doing inference, so the general rule for what this PR can or can't do is "we can accumulate Contraints whenever we can accumulate other things in InferenceContexts". The examples I gave in the PR description are places where a context is passed from a parent down to a child node in the AST, so they're pretty straightforward.

However as far I as can remember, existing InferenceContexts don't get passed across statements in a block. I think this is fundamentally why pylint's VariablesChecker is as complex as it is?

But anyway to answer your question, I think that this PR can be extended to handle the example you provided, but it wouldn't be by adding to my code very much, but rather by changing how these InferenceContexts (or some other "context" state) is passed around, using techniques similar to what's happening in the VariablesChecker. Or, using control flow graphs (😅) to precisely model the possible execution paths, and then accumulating Contraints across paths in the graph.

Please let me know if that answers your question, or if there are any follow-ups! Happy to be of service :)

david-yz-liu avatar Mar 21 '22 20:03 david-yz-liu

@DanielNoord wow, thank you! So sorry for the delay, I think I saw your message and meant to reply months ago but then... forgot. My bad!

Your question is a great one. It's been a while since I've thought about this, but I'll try to give a decent answer now (and perhaps a more thorough one later). I believe my current approach piggybacks off of the way InferenceContexts are passed around when doing inference, so the general rule for what this PR can or can't do is "we can accumulate Contraints whenever we can accumulate other things in InferenceContexts". The examples I gave in the PR description are places where a context is passed from a parent down to a child node in the AST, so they're pretty straightforward.

However as far I as can remember, existing InferenceContexts don't get passed across statements in a block. I think this is fundamentally why pylint's VariablesChecker is as complex as it is?

But anyway to answer your question, I think that this PR can be extended to handle the example you provided, but it wouldn't be by adding to my code very much, but rather by changing how these InferenceContexts (or some other "context" state) is passed around, using techniques similar to what's happening in the VariablesChecker. Or, using control flow graphs (😅) to precisely model the possible execution paths, and then accumulating Contraints across paths in the graph.

Please let me know if that answers your question, or if there are any follow-ups! Happy to be of service :)

I think that answers it pretty clearly. As you can see I spent some time familiarising myself with the code 😄 I think this could indeed be a very good first step towards some better control flow. I have been thinking about doing something with frames and line numbering to do some more difficult constraints, but at the moment I don't see how we couldn't change the current approach to do that. It all seems flexible enough.

Let me know what you think of my changes. I made the tests pass on 3.7 and made some of the typing a little more strict. I appreciate that you had the constraints allow for other nodes than If, but for ease of later expansion I think it makes sense to keep it tighter now and expand the typing if needed later on.

DanielNoord avatar Mar 21 '22 21:03 DanielNoord

Let me know what you think of my changes. I made the tests pass on 3.7 and made some of the typing a little more strict. I appreciate that you had the constraints allow for other nodes than If, but for ease of later expansion I think it makes sense to keep it tighter now and expand the typing if needed later on.

Will do. I'll try to look at this more closely tonight or tomorrow. 👍

david-yz-liu avatar Mar 21 '22 21:03 david-yz-liu

Pull Request Test Coverage Report for Build 3586640247

  • 92 of 92 (100.0%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.06%) to 92.365%

Totals Coverage Status
Change from base Build 3586270619: 0.06%
Covered Lines: 9980
Relevant Lines: 10805

💛 - Coveralls

coveralls avatar Mar 22 '22 05:03 coveralls

@DanielNoord Okay I reviewed all of your changes, they look good to me! Yeah I was going for more general types but happy to stay strict for now and expand later if needed. Also thanks for making that Python 3.7 fix, that would have taken me a while to track down 😅.

Anyway as far as I can tell, this is good from my end, just waiting for more detailed review/feedback.

david-yz-liu avatar Mar 23 '22 01:03 david-yz-liu

@cdce8p Do you still want to review this before we merge this? I think with my last comments addressed this is almost good to go!

DanielNoord avatar Apr 07 '22 20:04 DanielNoord

@cdce8p Do you still want to review this before we merge this? I think with my last comments addressed this is almost good to go!

If you're ok with waiting a bit longer, I could take a look in about 2-3 weeks. Will have much more time then to catch up on everything.

cdce8p avatar Apr 07 '22 20:04 cdce8p

@cdce8p Do you still want to review this before we merge this? I think with my last comments addressed this is almost good to go!

If you're ok with waiting a bit longer, I could take a look in about 2-3 weeks. Will have much more time then to catch up on everything.

Fine with me! Should be around the time that I finish with argparse so I'll have time to assist with anything that comes up then 😄

DanielNoord avatar Apr 07 '22 20:04 DanielNoord

@DanielNoord Sounds good to me!

cdce8p avatar Apr 07 '22 20:04 cdce8p

Yikes, I'm sorry I totally missed the last review from months ago! 😞

I'm happy to make all the requested changes later today, shouldn't take long.

david-yz-liu avatar Jun 17 '22 15:06 david-yz-liu

Sorry for the commit spam, I'm currently only running 3.10 locally. If this continues I'll fix it properly...

david-yz-liu avatar Jun 17 '22 20:06 david-yz-liu

Thanks for the review @cdce8p! I'll make these updates in the next couple of days for sure.

david-yz-liu avatar Jun 29 '22 18:06 david-yz-liu

Hello! I apologize for the delay again in making these updates (just have had a lot of other things going on). Notably, I've moved the constraint code back into a single file (constraint.py) and removed the __all__ and instead used the underscore prefix where appropriate.

david-yz-liu avatar Jul 25 '22 18:07 david-yz-liu

I handled the review by Marc and incorporate his comments to help move this PR along. I think we're all eager to get this merged 😄

DanielNoord avatar Nov 20 '22 23:11 DanielNoord

Thanks @DanielNoord and @cdce8p! I wasn't sure exactly when I'd have time to look at this again so very much appreciate the updates. 👍

BTW on the question of whether multiple constraint classes may match a given node (in the future), I think... maybe but not necessarily. For example, if we imagine a hypotheical "AndConstraint" from

if x is not None and isinstance(x, int):
    ... x ...

and so the if node's test is an and of two expressions, I could see either yielding two distinct constraints ("not none constraint" and "is int instance" constraint), or a single AndConstraint whose children are the two constraints. (Personally I'd prefer going with the latter approach.)

david-yz-liu avatar Nov 21 '22 02:11 david-yz-liu

BTW on the question of whether multiple constraint classes may match a given node (in the future), I think... maybe but not necessarily. For example, if we imagine a hypotheical "AndConstraint" from

if x is not None and isinstance(x, int):
    ... x ...

and so the if node's test is an and of two expressions, I could see either yielding two distinct constraints ("not none constraint" and "is int instance" constraint), or a single AndConstraint whose children are the two constraints. (Personally I'd prefer going with the latter approach.)

I think the concept of AndConstraint makes sense. This would remove the need for the yield behaviour.

I implemented the yielding behaviour for now but that could easily be reverted. @cdce8p what do you think?

DanielNoord avatar Nov 21 '22 21:11 DanielNoord

Codecov Report

Merging #1189 (46c056f) into main (f476ebc) will increase coverage by 0.05%. The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1189      +/-   ##
==========================================
+ Coverage   92.58%   92.63%   +0.05%     
==========================================
  Files          93       94       +1     
  Lines       10782    10869      +87     
==========================================
+ Hits         9982    10069      +87     
  Misses        800      800              
Flag Coverage Δ
linux 92.40% <100.00%> (+0.06%) :arrow_up:
pypy 88.54% <96.73%> (+0.05%) :arrow_up:
windows 92.31% <100.00%> (+0.06%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
astroid/bases.py 87.98% <100.00%> (+0.38%) :arrow_up:
astroid/constraint.py 100.00% <100.00%> (ø)
astroid/context.py 97.33% <100.00%> (+0.11%) :arrow_up:
astroid/inference.py 94.54% <100.00%> (+0.03%) :arrow_up:

codecov[bot] avatar Jan 06 '23 19:01 codecov[bot]

Thanks to all for reviews and persistence--huge step forward!

jacobtylerwalls avatar Jan 06 '23 20:01 jacobtylerwalls

💯 @Pierre-Sassoulas thank you! And thank you all for the efforts in reviewing this and keeping the branch up to date. I'm super excited to see how this evolves!

david-yz-liu avatar Jan 06 '23 20:01 david-yz-liu