astroid icon indicating copy to clipboard operation
astroid copied to clipboard

Improve consistency of JoinedStr inference

Open ericvergnaud opened this issue 1 year ago • 8 comments

This PR:

  • improves consistency of JoinedStr inference by always returning either Uninferable or a fully inferred Const
  • optionally, using a flag on JoinedStr, allows returning a partially inferred string such as "a/{MISSING_VALUE}/b" when inferring f"a/{missing}/b" and missing cannot be inferred.

Type of Changes

| ✓ | 🐛 Bug fix | ✓ | :sparkles: New feature |

Description

See above

Closes #2621

ericvergnaud avatar Oct 22 '24 17:10 ericvergnaud

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 93.21%. Comparing base (e380fd1) to head (4e9680c). Report is 48 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2622      +/-   ##
==========================================
- Coverage   93.24%   93.21%   -0.04%     
==========================================
  Files          93       93              
  Lines       11060    11076      +16     
==========================================
+ Hits        10313    10324      +11     
- Misses        747      752       +5     
Flag Coverage Δ
linux 93.10% <100.00%> (-0.04%) :arrow_down:
pypy 93.21% <100.00%> (-0.04%) :arrow_down:
windows 93.21% <100.00%> (-0.04%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
astroid/nodes/node_classes.py 94.93% <100.00%> (-0.21%) :arrow_down:

codecov[bot] avatar Oct 22 '24 17:10 codecov[bot]

I don't really understand your use case: can't we just return Uninferable when we can't infer the full string? That seems consistent with other nodes and is also the suggestion I already made when inference of JoinedStr was first added.

Returning Uninferable is not helpful in some usage scenarios. Consider the following example:

def method_with_arg(arg):
    spark.table(f"/mnt/root/dir/{arg}/logs")

We want to tell our users that reading spark tables from mounts is no longer supported. With JoinedStr.inferred() returning Uninferable, we're unable to detect that. The feature proposed in this PR makes it possible, optionally, to return "/mnt/root/dir/{MISSING_VALUE}/logs", which is what we need to detect the "/mnt/" prefix, and also provide better feedback to users. Happy to place the flag elsewhere.

ericvergnaud avatar Oct 22 '24 21:10 ericvergnaud

The feature proposed in this PR makes it possible, optionally, to return "/mnt/root/dir/{MISSING_VALUE}/logs", which is what we need to detect the "/mnt/" prefix, and also provide better feedback to users.

I guess l'm wondering why you would detect that by calling inferred(), wouldn't you just directly inspect the AST?

>>> from astroid import extract_node
>>> n = extract_node("""
... def method_with_arg(arg):
...     spark.table(f"/mnt/root/dir/{arg}/logs")
... """)
>>> from astroid import nodes
>>> js = list(n.nodes_of_class(nodes.JoinedStr))[0]
>>> js.values[0].value
'/mnt/root/dir/'

I'll comment further on the issue.

jacobtylerwalls avatar Oct 23 '24 02:10 jacobtylerwalls

I guess l'm wondering why you would detect that by calling inferred(), wouldn't you just directly inspect the AST?

Oh, I see, it's so that both cases (mnt injected by the formatted value) can be handled by one call. I guess I'm not sure astroid needs to support that.

jacobtylerwalls avatar Oct 23 '24 03:10 jacobtylerwalls

Personally I feel like this use case is best served with a dedicated method, or an implementation in the pylint plugin itself. There just seem to be too many edge cases where the behaviour of the flag would be hard to define.

I think .infer() should just have one behaviour which should be both easily explainable and matching with what we do for other node types. Any specific JoinedStr behaviour should probably not be part of a generic interface.

DanielNoord avatar Oct 23 '24 07:10 DanielNoord

The feature proposed in this PR makes it possible, optionally, to return "/mnt/root/dir/{MISSING_VALUE}/logs", which is what we need to detect the "/mnt/" prefix, and also provide better feedback to users.

I guess l'm wondering why you would detect that by calling inferred(), wouldn't you just directly inspect the AST?

>>> from astroid import extract_node
>>> n = extract_node("""
... def method_with_arg(arg):
...     spark.table(f"/mnt/root/dir/{arg}/logs")
... """)
>>> from astroid import nodes
>>> js = list(n.nodes_of_class(nodes.JoinedStr))[0]
>>> js.values[0].value
'/mnt/root/dir/'

I'll comment further on the issue.

Your proposal is not workable with the following code:

def method_with_arg(arg):
    path = f"/mnt/root/dir/{arg}/logs"
    spark.table(path)

not speaking of even more complex scenarios:

def method_with_arg(arg):
    paths = [f"/mnt/root/dir/{arg}/logs", "/some/other/path"]
    tables = [spark.table(path) for path in paths]

That's what makes node.inferred() so valuable!

ericvergnaud avatar Oct 23 '24 08:10 ericvergnaud

Personally I feel like this use case is best served with a dedicated method, or an implementation in the pylint plugin itself. There just seem to be too many edge cases where the behaviour of the flag would be hard to define.

I think .infer() should just have one behaviour which should be both easily explainable and matching with what we do for other node types. Any specific JoinedStr behaviour should probably not be part of a generic interface.

We're using astroid directly so a pylint plugin is not an option. I agree with the above philosophy in theory, but I disagree in practice because it stops me from getting the outcome I need from astroid. We're using it for inference (we could do parsing and transforms using ast) I guess it's legitimate for users to want to customise behaviour a bit, depending on their requirements. A better design would be to provide a replaceable NodeFactory, such that users can provide their own implementation of JoinedStr. Happy to provide such PR if it's welcome.

ericvergnaud avatar Oct 23 '24 08:10 ericvergnaud

I don't think I want the flag to be a thing on the class. astroid is not really configurable and I think that is actually beneficial in terms of maintainability.

I don't really understand your use case: can't we just return Uninferable when we can't infer the full string? That seems consistent with other nodes and is also the suggestion I already made when inference of JoinedStr was first added.

TBH this PR is not proposing to change the current behaviour (see FAIL_ON_UNINFERABLE = True), rather it makes it possible to customise it if required.

ericvergnaud avatar Oct 23 '24 08:10 ericvergnaud

@jacobtylerwalls @DanielNoord @Pierre-Sassoulas can you opine on the factory proposal ?

ericvergnaud avatar Oct 24 '24 07:10 ericvergnaud

I think .infer() should just have one behaviour which should be both easily explainable and matching with what we do for other node types. Any specific JoinedStr behaviour should probably not be part of a generic interface.

Agree. But if we know it's a string we can return a string even if it's not 'perfect'.

Still lightly in favor of inferring string the best we can even if we have to include 'Uninferrable' in the middle (/mnt/{Uninferable}/...). But it should be safeguarded, i.e. :

  • f"/mnt{uninferrable}/..." => we infer a str : /mnt/{Uninferable}/...
  • "/mnt" + uninferrable + "/..." => there's no cast to string for uninferrable so we might have a crash and we need to still detect it

Pierre-Sassoulas avatar Oct 24 '24 18:10 Pierre-Sassoulas

I guess I'm not certain we even know it's a string. Don't these uninferable cases include failures to format?

jacobtylerwalls avatar Oct 24 '24 20:10 jacobtylerwalls

Exactly, or they can include memory overflow issues, or calls to functions that crash or exit the interpreter.

I don't really see why we would special case this infer. If we infer an import we also know that it will be a module but we don't insert dummy module nodes if we can't infer the import.

DanielNoord avatar Oct 24 '24 21:10 DanielNoord

Let's remove the flag and return fully inferred const or uninferable then.

Pierre-Sassoulas avatar Oct 24 '24 21:10 Pierre-Sassoulas

That doesn't solve my problem. Any comments on the node factory proposal ?

ericvergnaud avatar Oct 25 '24 07:10 ericvergnaud

I think without seeing the code to do this we can't really comment on it. Bigger refactors of astroid have sadly failed before so I'm wary of rewriting how we construct nodes.

Can't you solve your use cases with a function that takes these nodes? Potentially by making that a method in astroid. Although you're not writing a pylint plug-in your use case does seem to imply you know what kind of node you're dealing with before calling infer.

DanielNoord avatar Oct 25 '24 11:10 DanielNoord

I think without seeing the code to do this we can't really comment on it. Bigger refactors of astroid have sadly failed before so I'm wary of rewriting how we construct nodes.

Can't you solve your use cases with a function that takes these nodes? Potentially by making that a method in astroid. Although you're not writing a pylint plug-in your use case does seem to imply you know what kind of node you're dealing with before calling infer.

This was just an example. We have dozens of use cases where a partial value is preferable to Uninferable, and the node type could be anything (Name, Attribute).

Re node factory, I'm not looking for pre-approval of a PR, rather I'm not interested in investing time providing one if philosophical concerns make it a no-go. I guess the above means 'possibly ok' ?

ericvergnaud avatar Oct 25 '24 14:10 ericvergnaud

Could you provide more? I'm still leaning towards infer() should be correct or Uninferable but I'm definitely open to suggestions. I just find it hard to wrap my head around how we could introduce this behaviour change without breaking expectations everywhere.

I'm wondering if try_to_infer() would be a nice extension of the Protocol.

"Possibly ok" is correct in the sense that we never reject proposals by default. I must warn though that maintainers time on astroid is very limited so grand refactors of the codebase might be hard to pull off.

DanielNoord avatar Oct 25 '24 14:10 DanielNoord

A node factory would transfer the responsibility of partial inference behavior to astroid users (via a dedicated class deriving from JoinedStr) rather than have it with the astroid team. From there the builtin JoinedStr.infer() would indeed be either correct or Uninferable, but astroid users would be able to override the behavior at no cost for astroid and astroid team. I suspect the refactoring would be small in essence (impacting mostly the parser) but may span minor changes across many files.

ericvergnaud avatar Nov 04 '24 16:11 ericvergnaud

I'm still not convinced the effort is worth the result, especially considering the alternatives seems much nicer to me:

  1. Make this part of the package you are working on to get full control
  2. Add a 'partial infer' method that has the behaviour you're looking for

DanielNoord avatar Nov 05 '24 22:11 DanielNoord

I'm still not convinced the effort is worth the result, especially considering the alternatives seems much nicer to me:

  1. Make this part of the package you are working on to get full control
  2. Add a 'partial infer' method that has the behavior you're looking for

Not sure what you mean by 1. ? Can you explain ? The last thing we want to do is fork astroid...

Re partial_infer, such a method would have to be available on all node types since the JoinedStr may not be immediately reachable, see for example:

def print_stuff(a):
    b = f"Hello {a}!"
    print(b)

In the above, b is a Name, not a JoinedStr, so we'd need a partial_infer method on Name, that would call the partial_infer method of its value... Not sure it's nicer... A node factory has the immense benefit of allowing all sorts of overrides, not just this one...

ericvergnaud avatar Nov 06 '24 14:11 ericvergnaud

Looks like this PR is not going o make it, despit having addressed the comments, and narrowed the scope to bug fixing ?

ericvergnaud avatar Nov 13 '24 19:11 ericvergnaud

Please wait a little before falling into despair, we're volunteers with limited time, no fast reviews do not mean it's going to be rejected only that we did not find the time to review yet ;)

Pierre-Sassoulas avatar Nov 13 '24 19:11 Pierre-Sassoulas

Re the factory proposal, as far as I understand, we have the ability to do that already:

>>> import astroid
>>> def my_transform(node):
...     def _infer_with_values(self, context=None, **kwargs):
...         1/0  # replace with your override
...     node._infer_with_values = _infer_with_values
...     return node
...     
>>> astroid.MANAGER.register_transform(astroid.nodes.JoinedStr, my_transform)
>>> js = astroid.extract_node("f'{missing}'")
>>> js.inferred()
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    js.inferred()
    ~~~~~~~~~~~^^
  File "/Users/jwalls/astroid/astroid/nodes/node_ng.py", line 585, in inferred
    return list(self.infer())
  File "/Users/jwalls/astroid/astroid/nodes/node_ng.py", line 167, in infer
    for i, result in enumerate(self._infer(context=context, **kwargs)):
                     ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/astroid/astroid/nodes/node_classes.py", line 4785, in _infer
    yield from self._infer_with_values(context)
               ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
  File "<python-input-1>", line 3, in _infer_with_values
    1/0
    ~^~
ZeroDivisionError: division by zero

jacobtylerwalls avatar Nov 17 '24 14:11 jacobtylerwalls

Agree with Jacob here, thanks for taking in the review and changing the PR. I don't have time for a review: @jacobtylerwalls would you be okay with merging this?

@ericvergnaud We would first need to rebase to prevent the conflicts.

DanielNoord avatar Nov 18 '24 08:11 DanielNoord

Agree with Jacob here, thanks for taking in the review and changing the PR. I don't have time for a review: @jacobtylerwalls would you be okay with merging this?

@ericvergnaud We would first need to rebase to prevent the conflicts.

The PR says there are no conflicts, so not sure why rebasing would be needed ?

ericvergnaud avatar Nov 18 '24 11:11 ericvergnaud

Thank you for you work and persistence on this @ericvergnaud !

Pierre-Sassoulas avatar Nov 18 '24 19:11 Pierre-Sassoulas