pytype
pytype copied to clipboard
Spurious "appears only once in the signature" error for TypeVars in classmethods
Consider the following example:
from typing import Sequence
from typing import TypeVar
T = TypeVar("T")
def size(seq: Sequence[T]) -> int:
return len(seq)
Type checking through pytype will fail with the following error, even though this is perfectly valid code:
Invalid type annotation 'T' [invalid-annotation] Appears only once in the signature
There are also more sophisticated examples (which I happened to encounter more than once in real code) where this check was bogus. mypy does not complain about this, even in the strict mode, which is contradictory to the claim that pytype is lenient.
There are definitely cases in which [invalid-annotation] fires incorrectly for a TypeVar (https://github.com/google/pytype/issues/379 is the big one that we're aware of), but this isn't one of them. What purpose does T
serve in def size(seq: Sequence[T]) -> int
? The interpretation pytype subscribes to is that a TypeVar has to appear twice in the same generic scope to be meaningful, so it is intended that introducing a meaningless TypeVar produces an error.
Do you have one of the more sophiscated examples of [invalid-annotation] on hand? I don't have enough information here to figure out what bug you may be seeing.
While I don't want to get too side-tracked on this - mypy isn't the gold standard for how type checkers should behave; PEP 484 is. And "lenient" is not a synonym for "accepts everything, right or wrong".
@rchen152
What purpose does
T
serve indef size(seq: Sequence[T]) -> int
?
The purpose is to express "it accepts a collection of any type". How else would you type it? In other statically typed languages, e.g. Haskell, length
has similar type ([a] -> int
, where [a]
denotes list of type a
).
Do you have one of the more sophiscated examples of [invalid-annotation] on hand?
Sure, here's something I stumbled upon recently. Whether this is a well-designed code is of course debatable, but it should type check (it does under mypy).
from abc import abstractmethod
from typing import Generic
from typing import TypeVar
T = TypeVar("T")
class Foo(Generic[T]):
@abstractmethod
@classmethod
def quux(cls) -> T:
pass
class Bar(Foo[str]):
@classmethod
def quux(cls) -> str:
return "bar"
And "lenient" is not a synonym for "accepts everything, right or wrong".
The definition of "lenient" from the pytype's README file says:
Pytype is lenient instead of strict. That means it allows all operations that succeed at runtime and don't contradict annotations.
Both examples succeed at run-time and they don't contradict annotations. Disallowing unused variables (in this case: type variables) is a job of a linter, not a type checker, in my opinion.
The way to express "a sequence of any type" would be to use typing.Any:
from typing import Any, Sequence
def size(seq: Sequence[Any]) -> int:
return len(seq)
Thanks for the additional example. I simplified it further down to the following to demonstrate the bug more clearly:
from typing import Generic, TypeVar
T = TypeVar("T")
class Foo(Generic[T]):
@classmethod
def quux(cls) -> T: # pytype reports [invalid-annotation] here
pass
I'm slightly skeptical that it makes sense for a classmethod to return a TypeVar bound to the class (what type should Foo.qux()
have?), but I can imagine there might be cases in which a classmethod should return T
when called on an instance of the class but fall back to Any
otherwise. I'll update the title to reflect that this issue is about classmethods.
It's not the case that the example of a single TypeVar "[doesn't] contradict annotations" - pytype can't tell whether the annotation is contradicted or not because it doesn't know how to interpret the annotation in the first place! The TypeVar can't just be ignored; its presence triggers special matching rules for generic functions, leading to unexpected behavior when the assumption that the function is generic isn't met.
The way to express "a sequence of any type" would be to use typing.Any:
from typing import Any, Sequence def size(seq: Sequence[Any]) -> int: return len(seq)
But this is different. Any
is a type to "bail out" of type checking in a code that cannot be properly typed and introduces type unsafety. If we were to use Any
, one can add an erroneous line into the function and it is still going to type check (both under mypy and pytype) but will fail at runtime:
from typing import Any
from typing import Sequence
def size(seq: Sequence[Any]) -> int:
seq[0].foobar()
return len(seq)
size([1, 2, 3]) # Raises `AttributeError: 'int' object has no attribute 'foobar'` at runtime.
Adding the same erroneous line into my example will make both type checkers notice the issue:
- mypy:
error: "T" has no attribute "foobar"
- pytype:
No attribute 'foobar' on int [attribute-error]
I'm slightly skeptical that it makes sense for a classmethod to return a TypeVar bound to the class (what type should Foo.qux() have?)
This is why in my example it was an abstract method that a subclass has to implement with some concrete type.
Huh, I'll admit it's sort of neat that pytype captures the argument type in T
. Unfortunately, this falls into the category of "unexpected behavior" - the unintentional, untested behavior that pytype displays when given errorful annotations will sometimes look correct, depending on your expectations, but can't be relied on.
If you want an annotation that will reflect that size
can be called with a sequence of anything and also produce errors if you try to do any operation that is not defined on every possible object, then Sequence[object]
is probably what you're looking for.
As a side note, mypy appears to throw an attribute error for any attribute accessed on seq[0]
. Even if you replace .foobar()
with something like .real
that exists on ints, the error remains.
This is why in my example it was an abstract method that a subclass has to implement with some concrete type.
Ah, that makes sense. Thanks for the clarification.
If you want an annotation that will reflect that size can be called with a sequence of anything and also produce errors if you try to do any operation that is not defined on every possible object, then Sequence[object] is probably what you're looking for.
Yes, in this very example Sequence[object]
would do, because Sequence
is covariant. However, this will not work for something like MutableSequence
, which is invariant. Consider the following function:
from typing import MutableSequence
from typing import TypeVar
T = TypeVar('T')
def swap(seq: MutableSequence[T], i: int, j: int) -> None:
seq[i], seq[j] = seq[j], seq[i]
With this definition, everything is fine, we can use it on lists of different types:
l_int = [1, 2, 3]
swap(l_int, 0, 2)
print(l_int) # => [3, 2, 1]
l_str = ['foo', 'bar', 'baz']
swap(l_str, 0, 2)
print(l_str) # => ['baz', 'bar', 'foo']
Except for the "appears only once in the signature error" in pytype, everything type checks (pytype does not complain about anything else and mypy is completely happy even in the strict mode).
However, in this example, we cannot get rid of T
. Using Any
leads to the issues I mentioned. Using object
doesn't work because of invariance of MutableSequence
:
from typing import MutableSequence
def swap(seq: MutableSequence[object], i: int, j: int) -> None:
seq[i], seq[j] = seq[j], seq[i]
l_int = [1, 2, 3]
swap(l_int, 0, 2)
l_str = ['foo', 'bar', 'baz']
swap(l_str, 0, 2)
This will fail in mypy with the following errors:
error: Argument 1 to "swap" has incompatible type "List[int]"; expected "MutableSequence[object]"
error: Argument 1 to "swap" has incompatible type "List[str]"; expected "MutableSequence[object]"
At the time, pytype is happy with the code only because it does not support correct variance checking, as far as I understand. Once it does, it will be impossible to properly type this function without using a single occurrence of a type variable.
As a side note, mypy appears to throw an attribute error for any attribute accessed on seq[0]. Even if you replace .foobar() with something like .real that exists on ints, the error remains.
Hm, I am not sure what you mean. Could you show a more concrete example where mypy complains but it shouldn't?
Coincidentally, the topic of whether a single TypeVar is allowable in a function signature came up in a meeting I was in this morning (the monthly tensor typing call organized over the public typing-sig mailing list), so we were able to briefly discuss this in a group that included some representation from pyre (the Facebook type checker), although unfortunately not mypy.
- We were in agreement that the typing specification does not cover how type checkers ought to treat a single TypeVar, so there is currently no authoritative answer here.
- pyre allows single TypeVars, with the reasoning that if it is not required by the spec to be an error, it shouldn't be an error.
- A proposed use case for a single ListVariadic (a speculative new kind of TypeVar from a draft PEP) came up, as a placeholder to represent an unused part of an annotation.
- A pytype user offered their personal experience that the [invalid-annotation] error was helpful when initially figuring out how to use Python type annotations.
I've kicked off a typing-sig thread (https://mail.python.org/archives/list/[email protected]/thread/NRFNHGXHXPGBR6FP3TIOZZ6VS4XJZX6K/) to try to get some more opinions on this matter.
Hm, I am not sure what you mean. Could you show a more concrete example where mypy complains but it shouldn't?
I was referring to the error mypy raises in this case:
from typing import Sequence, TypeVar
T = TypeVar('T')
def size(seq: Sequence[T]) -> int:
seq[0].foobar() # mypy: error: "T" has no attribute "foobar"
return len(seq)
size([1, 2, 3])
With the way this example is constructed, it looks like the error is raised because int
does not have a foobar
attribute. If you replace foobar
with an attribute that int does have, you still get an error (which isn't wrong, I was just pointing out that the typevar is not capturing the element type from the function call, despite what it looks like).
A proposed use case for a single ListVariadic (a speculative new kind of TypeVar from a draft PEP) came up, as a placeholder to represent an unused part of an annotation.
Interesting, is this draft PEP available somewhere? I wonder, would be it something akin to Java's ?
type wildcard syntax? In Java the following is perfectly valid:
public static <T> int size(List<T> list) {
return list.size();
}
But the preferred way would be the following:
public static int size(List<?> list) {
return list.size();
}
The funny thing is, I even worked on detecting this very issue (replacing unused type variables with wildcards) for Error Prone!
I've kicked off a typing-sig thread (https://mail.python.org/archives/list/[email protected]/thread/NRFNHGXHXPGBR6FP3TIOZZ6VS4XJZX6K/) to try to get some more opinions on this matter.
Thanks for doing that!
With the way this example is constructed, it looks like the error is raised because int does not have a foobar attribute. If you replace foobar with an attribute that int does have, you still get an error (which isn't wrong, I was just pointing out that the typevar is not capturing the element type from the function call, despite what it looks like).
I think it is because mypy and pytype use different type checking strategy for generics. Considering the following function:
from typing import Iterable
from typing import TypeVar
T = TypeVar('T')
def concat(values: Iterable[T]) -> T:
return ''.join(values)
In mypy it will fail to type check, because the definition itself is verified:
error: Incompatible return value type (got "str", expected "T")
error: Argument 1 to "join" of "str" has incompatible type "Iterable[T]"; expected "Iterable[str]"
With pytype, the definition is not checked until the function is actually used, so it will allow calling it like this:
print(concat(['foo', 'bar', 'baz']))
However, it will fail if used like this:
print(concat([1, 2, 3]))
# Function str.join was called with the wrong arguments [wrong-arg-types]
# Expected: (self, iterable: Iterable[str])
# Actually passed: (self, iterable: Iterable[int])
So, mypy follows the strategy that most languages with generics have (e.g. Java, C#, TypeScript, Rust, Haskell). The only two languages I know that follow pytype's strategy is C++ (templates) and Crystal. Having worked with both, from my experience, the first strategy has some advantages:
- Signature is a contract, the function is not allowed to do what is not specified in its header. If the function works with any generic type
T
then the caller is sure that it will not invoke any, potentially dangerous, method because it cannot assume that the method is there. - Better error messages. With the templating approach, if the caller calls a generic function that calls another generic function and so on, then one has to go through the entire stack of functions and implementation details to figure out what went wrong (who didn't have to go through pages of errors after making a single typo in the type of an STL container...?).
- Faster error discovery. A generic function can be introduced in commit
A
and then start to be used in a separate commitB
. However, until the commitB
is made, it is unclear whetherA
is really type safe—if it is unused, it is not type checked. - Compilation is faster, because functions are checked in isolation and don't have to be monomorphised all way down at each call site.
Having said this, I can see that pytype's strategy can be useful when gradually migrating completely dynamic, duck-typed codebase to a statically typed one without going through significant refactoring along the way. I suspect, this is the "lenience" that is mentioned in pytype's README file.