Syntax error for custom type annotations
Things to check first
-
[x] I have searched the existing issues and didn't find my bug already reported there
-
[x] I have checked that my bug is still present in the latest release
Typeguard version
4.4.1
Python version
3.11
What happened?
typeguard.typechecked fails with a syntax error for annotations that use item access with a string that is not eval-able. Basic example:
import typeguard
annot = {"a b": int, "c d": str}
@typeguard.typechecked
def foo1(x: annot["a b"]) -> int:
return 10
yields
File "<unknown>", line 1
a b
^
SyntaxError: invalid syntax
This problem occurs with custom annotation types like for example jaxtyping which uses annotations like Float[np.ndarray, "H W C"] to annotate shape information. The problem was previously raised in #353 for the special case of Annotated. The fix from be4dd33 solves this for plain Annotated, but is quite brittle. For example this fails:
import typing
MyAnnotated = typing.Annotated # simple alias
@typeguard.typechecked
def foo2(x: MyAnnotated[int, "a b"]) -> int:
return 10
The problem in both cases is that the instrumentation tries to evaluate "a b" as a forward reference which leads to a syntax error. I don't think this is the desired behaviour. Custom annotations are allowed by python and should not lead to errors. For reference, the typing.get_type_hints() function succeeds in both cases:
typing.get_type_hints(foo1)
# {'x': int, 'return': int}
typing.get_type_hints(foo2)
# {'x': int, 'return': int}
I suggest to solve this problem by changing the instrumented code for a function to use typing.get_type_hints() rather than trying to manually catch all corner cases during the AST parsing.
Currently the instrumented code for a simple function looks like this:
def bar(x: int, y: str) -> None:
...
return
# ---- instrumented ---
def bar(x: int, y: str) -> None:
from typeguard import TypeCheckMemo
from typeguard._functions import check_argument_types, check_return_type
memo = TypeCheckMemo(globals(), locals())
check_argument_types('bar', {'x': (x, int), 'y': (y, str)}, memo)
...
return check_return_type('bar', None, None, memo)
The problem could be fixed if instead trying to resolve the forward references during AST parsing, the check_arguments_types function simply called typing.get_type_hints(). So with a new check_argument_types_v2:
def check_argument_types_v2(fun, args, memo):
name = fun.__name__
hints = typing.get_type_hints(fun)
annotated_arguments = {
k: (v, hints[k])
for k, v in args.items()
if k in hints
}
return check_argument_types(name, annotated_arguments, memo)
The instrumented code could become:
def bar(x: int, y: str) -> None:
from typeguard import TypeCheckMemo
from typeguard._functions import check_argument_types, check_return_type
memo = TypeCheckMemo(globals(), locals())
check_argument_types_2(bar, {'x': x, 'y': y}, memo)
...
return check_return_type('bar', None, None, memo) # (should probably do the same here)
This should avoid all the above edge-cases and be pretty straightforward to implement. I have a proof-of-concept working. Happy to submit it as a PR if that is wanted (though I am not very familiar with the AST-parsing code, and I don't feel confident in my edits).
How can we reproduce the bug?
import typeguard
annot = {"a b": int}
@typeguard.typechecked
def foo1(x: annot["a b"]) -> int:
return 10
I think there could be a compromise: using get_type_hints() when an annotation involving a string is present (excluding the use of typing.Literal). While some projects still use a lot of string-based annotations, I'm seeing them less and less, making this likely a non-issue.
Wow, thanks for the quick response.
A compromise solution would be possible I agree. But I do not see any advantage.
To me it seems that the get_type_hints approach would be more robust without any disadvantages.
What downsides do you see with the get_type_hints approach? Am I missing something?
I guess one problem that I can see is that this only works for the signature of the function and not for variable annotations in the function body. So fixing this for variable annotations might require more work.
The downside is that the results can't be cached. This is the reason why I was so aggressive in removing the quotes. But performance should never come at the cost of correctness.
Ah, I see. Performance concerns are a good point. get_type_hints can be quit slow.
But why couldn't the results be cached? Is there a problem with something like this?
def check_argument_types_v2(fun, args, memo):
name = fun.__name__
if not hints := getattr(fun, `_cached_type_hints`, None):
hints = typing.get_type_hints(fun)
fun._cached_type_hints = hints
annotated_arguments = {
k: (v, hints[k])
for k, v in args.items()
if k in hints
}
return check_argument_types(name, annotated_arguments, memo)
Where do you get fun from?
Inside a function you have access to the function object. So I would just pass that along to the check_argument_types function.
def foo(x: int):
check_argument_types_v2(foo, {"x": x})
Inside a function you have access to the function object.
No, you don't. I cannot stress how much easier my life would've been if this was the case!
I have good news for you then. Because this works. Try it. 😉
def foo():
print(foo)
foo()
https://stackoverflow.com/questions/852401/python-getting-a-reference-to-a-function-from-inside-itself
I have good news for you then. Because this works. Try it. 😉
def foo(): print(foo)
foo()
https://stackoverflow.com/questions/852401/python-getting-a-reference-to-a-function-from-inside-itself
Did you honestly think I never tried this before? Trivial cases are not useful to me. Consider the following:
def outer() -> None:
def inner(x: int, y: str) -> None:
pass
return inner
outer()(1, "foo")
Now please explain how I can get a reference to the inner() function from inside it.
Another example:
def wrapper(func):
def inner(*args, **kwargs):
return func(*args, **kwargs)
return inner
@wrapper
def real_function(x: int, y: int) -> None:
pass
This is another common pattern, and from within real_function(), the real_function name only gets you the wrapper, not the actual function where the annotations are.
Oh, I see. I can see now that I have underestimated both you and the problem. I apologize! Thank you for explaining.
Oh, I see. I can see now that I have underestimated both you and the problem. I apologize! Thank you for explaining.
To be fair, I also thought in the beginning that I could get the function instance, but after an intense debate with one of the PyPy maintainers, I realized that the function objects are disposed of eagerly unless they're in globals or locals.
Thanks :-). Yeah I definitely misunderstood what accessing the function name from inside the function does. I thought it was a special trick, but it just accesses the name from the enclosing scope. So in some cases this would work (even with nested functions). But it is not reliable at all.
After playing with this for a while, I am now ready to accept that there is no way to reliably access the function object from within that function, not even when resorting to sys._getframe(0).
So for using typing.get_type_hints() it would have to be done from the outside. Is there a reason you moved the argument checking into the function via instrumentation (other than the ability to inject checks for variable annotations in the body)?
Or to put differently: Is there a reason not to do this:
def typechecked(fun):
...
fun = instrument(fun) # for body checks only
...
@functools.wraps(fun)
def inner(*args, **kwargs):
if not (hints := getattr(fun, '_cached_hints', None)):
hints = typing.get_type_hints(fun)
fun._cached_hints = hints
check_function_args(args, kwargs, hints)
result = fun(*args, **kwargs)
check_return_value(result, hints)
return result
return inner
So for using
typing.get_type_hints()it would have to be done from the outside. Is there a reason you moved the argument checking into the function via instrumentation (other than the ability to inject checks for variable annotations in the body)?
Yes, there are two reasons:
- Debugging. Each wrapped function generates two stack frames, one for the wrapper and one for the actual function, and this tends to add up in long call stacks and make the use of debuggers much more painful (although tricks like
__traceback_hide__can mitigate the issue). The@typecheckeddecorator did used to wrap the containing function, and I may revert to doing just this in the next major release, but it doesn't help with the auto-instrumented code. - Compatibility with other decorators: it's often not clear where in the decorator stack
@typecheckedshould go. Well written wrappers use@functools.wrapsbut in reality, not every library author has knowledge of this, plus some wrappers are built-in (C code) and are thus inflexible about this.
To upgrade to the new version of typeguard while still keeping it compatible with jaxtyping (a combination we use heavily), I ended up implementing an escape hatch for the instrumentation along the lines of this PR: #519
Would you consider adding something like that?
The next major release will switch back to using function wrapping in @typechecked - would that work for you?
The main reason being that instrumenting individual functions like that is terribly expensive and causes import times to soar, plus it doesn't work in the REPL.
Yes that works for me, Thanks!