typeguard icon indicating copy to clipboard operation
typeguard copied to clipboard

Syntax error for custom type annotations

Open Qwlouse opened this issue 10 months ago • 17 comments

Things to check first

  • [x] I have searched the existing issues and didn't find my bug already reported there

  • [x] I have checked that my bug is still present in the latest release

Typeguard version

4.4.1

Python version

3.11

What happened?

typeguard.typechecked fails with a syntax error for annotations that use item access with a string that is not eval-able. Basic example:

import typeguard
annot = {"a b": int, "c d": str}

@typeguard.typechecked
def foo1(x: annot["a b"]) -> int:
  return 10

yields

File "<unknown>", line 1
    a b
      ^
SyntaxError: invalid syntax

This problem occurs with custom annotation types like for example jaxtyping which uses annotations like Float[np.ndarray, "H W C"] to annotate shape information. The problem was previously raised in #353 for the special case of Annotated. The fix from be4dd33 solves this for plain Annotated, but is quite brittle. For example this fails:

import typing

MyAnnotated = typing.Annotated  # simple alias

@typeguard.typechecked
def foo2(x: MyAnnotated[int, "a b"]) -> int:
  return 10

The problem in both cases is that the instrumentation tries to evaluate "a b" as a forward reference which leads to a syntax error. I don't think this is the desired behaviour. Custom annotations are allowed by python and should not lead to errors. For reference, the typing.get_type_hints() function succeeds in both cases:

typing.get_type_hints(foo1)
# {'x': int, 'return': int}

typing.get_type_hints(foo2)
# {'x': int, 'return': int}

I suggest to solve this problem by changing the instrumented code for a function to use typing.get_type_hints() rather than trying to manually catch all corner cases during the AST parsing. Currently the instrumented code for a simple function looks like this:

def bar(x: int, y: str) -> None:
  ...
  return

# ---- instrumented ---
def bar(x: int, y: str) -> None:
    from typeguard import TypeCheckMemo
    from typeguard._functions import check_argument_types, check_return_type
    memo = TypeCheckMemo(globals(), locals())
    check_argument_types('bar', {'x': (x, int), 'y': (y, str)}, memo)
    ...
    return check_return_type('bar', None, None, memo)

The problem could be fixed if instead trying to resolve the forward references during AST parsing, the check_arguments_types function simply called typing.get_type_hints(). So with a new check_argument_types_v2:

def check_argument_types_v2(fun, args, memo):
  name = fun.__name__
  hints = typing.get_type_hints(fun)
  annotated_arguments = {
        k: (v, hints[k])
        for k, v in args.items()
        if k in hints
  }
  return check_argument_types(name, annotated_arguments, memo)

The instrumented code could become:

def bar(x: int, y: str) -> None:
    from typeguard import TypeCheckMemo
    from typeguard._functions import check_argument_types, check_return_type
    memo = TypeCheckMemo(globals(), locals())
    check_argument_types_2(bar, {'x': x, 'y': y}, memo)
    ...
    return check_return_type('bar', None, None, memo)  # (should probably do the same here)

This should avoid all the above edge-cases and be pretty straightforward to implement. I have a proof-of-concept working. Happy to submit it as a PR if that is wanted (though I am not very familiar with the AST-parsing code, and I don't feel confident in my edits).

How can we reproduce the bug?

import typeguard
annot = {"a b": int}

@typeguard.typechecked
def foo1(x: annot["a b"]) -> int:
  return 10

Qwlouse avatar Feb 19 '25 19:02 Qwlouse

I think there could be a compromise: using get_type_hints() when an annotation involving a string is present (excluding the use of typing.Literal). While some projects still use a lot of string-based annotations, I'm seeing them less and less, making this likely a non-issue.

agronholm avatar Feb 20 '25 00:02 agronholm

Wow, thanks for the quick response. A compromise solution would be possible I agree. But I do not see any advantage. To me it seems that the get_type_hints approach would be more robust without any disadvantages. What downsides do you see with the get_type_hints approach? Am I missing something?

I guess one problem that I can see is that this only works for the signature of the function and not for variable annotations in the function body. So fixing this for variable annotations might require more work.

Qwlouse avatar Feb 26 '25 12:02 Qwlouse

The downside is that the results can't be cached. This is the reason why I was so aggressive in removing the quotes. But performance should never come at the cost of correctness.

agronholm avatar Feb 26 '25 12:02 agronholm

Ah, I see. Performance concerns are a good point. get_type_hints can be quit slow. But why couldn't the results be cached? Is there a problem with something like this?

def check_argument_types_v2(fun, args, memo):
  name = fun.__name__
  if not hints := getattr(fun, `_cached_type_hints`, None):
      hints = typing.get_type_hints(fun)
      fun._cached_type_hints = hints

  annotated_arguments = {
        k: (v, hints[k])
        for k, v in args.items()
        if k in hints
  }
  return check_argument_types(name, annotated_arguments, memo)

Qwlouse avatar Feb 26 '25 13:02 Qwlouse

Where do you get fun from?

agronholm avatar Feb 26 '25 13:02 agronholm

Inside a function you have access to the function object. So I would just pass that along to the check_argument_types function.

def foo(x: int):
  check_argument_types_v2(foo, {"x": x})

Qwlouse avatar Feb 26 '25 13:02 Qwlouse

Inside a function you have access to the function object.

No, you don't. I cannot stress how much easier my life would've been if this was the case!

agronholm avatar Feb 26 '25 13:02 agronholm

I have good news for you then. Because this works. Try it. 😉

def foo():
  print(foo)

foo()

https://stackoverflow.com/questions/852401/python-getting-a-reference-to-a-function-from-inside-itself

Qwlouse avatar Feb 26 '25 13:02 Qwlouse

I have good news for you then. Because this works. Try it. 😉

def foo(): print(foo)

foo()

https://stackoverflow.com/questions/852401/python-getting-a-reference-to-a-function-from-inside-itself

Did you honestly think I never tried this before? Trivial cases are not useful to me. Consider the following:

def outer() -> None:
    def inner(x: int, y: str) -> None:
        pass

    return inner

outer()(1, "foo")

Now please explain how I can get a reference to the inner() function from inside it. Another example:

def wrapper(func):
    def inner(*args, **kwargs):
        return func(*args, **kwargs)

    return inner

@wrapper
def real_function(x: int, y: int) -> None:
    pass

This is another common pattern, and from within real_function(), the real_function name only gets you the wrapper, not the actual function where the annotations are.

agronholm avatar Feb 27 '25 00:02 agronholm

Oh, I see. I can see now that I have underestimated both you and the problem. I apologize! Thank you for explaining.

Qwlouse avatar Feb 27 '25 13:02 Qwlouse

Oh, I see. I can see now that I have underestimated both you and the problem. I apologize! Thank you for explaining.

To be fair, I also thought in the beginning that I could get the function instance, but after an intense debate with one of the PyPy maintainers, I realized that the function objects are disposed of eagerly unless they're in globals or locals.

agronholm avatar Feb 27 '25 13:02 agronholm

Thanks :-). Yeah I definitely misunderstood what accessing the function name from inside the function does. I thought it was a special trick, but it just accesses the name from the enclosing scope. So in some cases this would work (even with nested functions). But it is not reliable at all.

After playing with this for a while, I am now ready to accept that there is no way to reliably access the function object from within that function, not even when resorting to sys._getframe(0).

So for using typing.get_type_hints() it would have to be done from the outside. Is there a reason you moved the argument checking into the function via instrumentation (other than the ability to inject checks for variable annotations in the body)? Or to put differently: Is there a reason not to do this:

def typechecked(fun):
  ...
  fun = instrument(fun) # for body checks only 
  ...
  @functools.wraps(fun)
  def inner(*args, **kwargs):
    if not (hints := getattr(fun, '_cached_hints', None)):
      hints = typing.get_type_hints(fun)
      fun._cached_hints = hints
    check_function_args(args, kwargs, hints)
    result =  fun(*args, **kwargs)
    check_return_value(result, hints)
    return result

  return inner

Qwlouse avatar Feb 27 '25 14:02 Qwlouse

So for using typing.get_type_hints() it would have to be done from the outside. Is there a reason you moved the argument checking into the function via instrumentation (other than the ability to inject checks for variable annotations in the body)?

Yes, there are two reasons:

  1. Debugging. Each wrapped function generates two stack frames, one for the wrapper and one for the actual function, and this tends to add up in long call stacks and make the use of debuggers much more painful (although tricks like __traceback_hide__ can mitigate the issue). The @typechecked decorator did used to wrap the containing function, and I may revert to doing just this in the next major release, but it doesn't help with the auto-instrumented code.
  2. Compatibility with other decorators: it's often not clear where in the decorator stack @typechecked should go. Well written wrappers use @functools.wraps but in reality, not every library author has knowledge of this, plus some wrappers are built-in (C code) and are thus inflexible about this.

agronholm avatar Feb 27 '25 14:02 agronholm

To upgrade to the new version of typeguard while still keeping it compatible with jaxtyping (a combination we use heavily), I ended up implementing an escape hatch for the instrumentation along the lines of this PR: #519

Would you consider adding something like that?

Qwlouse avatar Mar 19 '25 14:03 Qwlouse

The next major release will switch back to using function wrapping in @typechecked - would that work for you?

agronholm avatar Mar 20 '25 09:03 agronholm

The main reason being that instrumenting individual functions like that is terribly expensive and causes import times to soar, plus it doesn't work in the REPL.

agronholm avatar Mar 20 '25 09:03 agronholm

Yes that works for me, Thanks!

Qwlouse avatar Mar 20 '25 17:03 Qwlouse