LibCST icon indicating copy to clipboard operation
LibCST copied to clipboard

The example code in document of `MatchMetadataIfTrue` doesn't work as expected

Open MapleCCC opened this issue 4 years ago • 3 comments

The document of MatchMetadataIfTrue suggests an example code to match against any Name node for the identifier foo as long as that identifier is found at the beginning of an unindented line. I tried to reproduce the example code:

import libcst as cst
import libcst.matchers as m
from libcst.metadata import PositionProvider

source = """
foo
"""

module = cst.parse_module(source)

names = m.findall(
    module,
    m.Name(
        value="foo",
        metadata=m.MatchMetadataIfTrue(
            PositionProvider, lambda position: position.start.column == 0
        ),
    ),
)

print(names)  # this statement prints an empty list, which it should not

I follow the example code verbatim, yet the result is not as expected. The names list should contain one cst.Name node, yet it's empty. The source code of LibCST is so voluminous that I can't figure out where the bug originates. I am presenting the bug here, so that someone familiar with the LibCST codebase can kindly take a look.

MapleCCC avatar Oct 01 '21 12:10 MapleCCC

Interesting, so this example only works when you pass in a MetadataWrapper as findall()'s first parameter, so:

import libcst as cst
import libcst.matchers as m
from libcst.metadata import PositionProvider

source = """
foo
"""

module = cst.MetadataWrapper(cst.parse_module(source))

names = m.findall(
    module,
    m.Name(
        value="foo",
        metadata=m.MatchMetadataIfTrue(
            PositionProvider, lambda position: position.start.column == 0
        ),
    ),
)

print(names)

This probably has something to do with looking up node metadata by reference, but I can't pinpoint where exactly the issue is happening.

zsol avatar Oct 01 '21 19:10 zsol

So I inspect the source code, and figure out that this happens because, when the first parameter of findall() is not a MetadataWrapper, findall() internally uses a stub metadata_lookup that says no to any metadata lookup. And this leads to no matching at all because the document of MatchMetadataIfTrue states that "If the metadata value does not exist for a particular node, MatchMetadataIfTrue will always be considered not a match".

While this implementation logic is not technically wrong, it may be a better idea to just fail the call instead, so that users are WARNED/AWARE/REQUESTED that they should always use a MetadataWrapper when MatchMetadata/MatchMetadataIfTrue are used, instead of just returning a possibly-bug-hidden empty result.

MapleCCC avatar Oct 04 '21 15:10 MapleCCC

FWIW, these two lines are where the stub metadata_lookup is picked:

https://github.com/Instagram/LibCST/blob/13485d3c2f5bb3e4b6219dac6ba7bb08c851f21f/libcst/matchers/_matcher_base.py#L1676-L1677

MapleCCC avatar Oct 04 '21 15:10 MapleCCC

I think it's good to assume that when a user supplies a metadata-based matcher, such as MatchMetadataIfTrue(PositionProvider, xxx), he almost always intends for that metadata provider, such as PositionProvider, to be resolved during match. Could we reach consensus on this depiction of user intention?

If we all agree on this, then I believe we should raise a helpful exception under the scenario of unresolved metadata, instead of silently ignore the missing metadata, which could secretly hide potential errors.

To add to the argument, this is also not the first place LibCST tries to warn user about unresolved metadata provider. For example, MetadataDependent.get_metadata() raises exception that asks user xxx is a dependency, but not set; did you forget a MetadataWrapper? [1]. I believe the same design rationale is also applicable here.

LibCST also has a overarching design philosophy that it tries to be explicit over implicit, and tries to make easily-slip-through errors loud instead of potentially silently hidden. For example, LibCST forces developers to use RemovalSentinel instead of None to signal deletions, because "...We use this instead of None to force developers to be explicit about deletions. Because None is the default return value for a function with no return statement, it would be too easy to accidentally delete nodes from the tree by forgetting to return a value..." [2].

To be more concrete, I propose to change the following lines in _construct_metadata_fetcher_null() [3]:

  def _construct_metadata_fetcher_null() -> Callable[
      [meta.ProviderT, libcst.CSTNode], object
  ]:
-     def _fetch(*args: object, **kwargs: object) -> object:
-         return _METADATA_MISSING_SENTINEL
+     def _fetch(provider: meta.ProviderT, node: libcst.CSTNode) -> NoReturn:
+         raise LookupError(f"{provider.__name__} is not resolved; did you forget a MetadataWrapper?")

      return _fetch

I think this change adds to the ergonomic and user-friendliness of LibCST. Please feel free to voice out your thoughts on this.

MapleCCC avatar Aug 26 '22 08:08 MapleCCC

Yep, all of this makes sense to me. Happy to accept a PR that will do this. We should roll this out as a minor version bump and be loud in the changelog that this is a breaking change.

zsol avatar Aug 26 '22 17:08 zsol

@zsol Cool! Thank you for the positive comment! A PR #757 has been opened to address the proposed changes.

A side note: I am not too concerned about this being a breaking change. Since there are so few bug reports around this, we can safely assume that this is statistically infrequent. Also even if it does break user's code, the breaking is benevolent, helping user surface up legitimate matches previously overlooked.

MapleCCC avatar Aug 27 '22 06:08 MapleCCC