Add typings
This is a work in progress and still produces errors when type checking the codebase. I added type hints to the Python files where it was feasible and updated the code of the methods to be type safe. In two places (pool.py and exceptions/_base.py), adding type hints to the code itself would not work because of the dynamic nature of the code in those modules. There may also be places where I'm making wrong assumptions about the code (especially in the cython code), so a thorough review would be very welcome. Lastly, I did not add typings for asyncpg._testbase since that seems to be an internal testing module.
Some of the remaining problems that mypy finds may be bugs in the code or they may be that mypy is being overly strict:
-
cursor.py: InBaseCursor.__repr__,self._statecould technically beNone, and cause an exception whenself._state.queryis used -
connection.py:- In Python 3.7+,
asyncio.current_task()can returnNone, socompat.current_task()has to be typed as returningOptional[Task[Any]]. This meansConnection._cancel()may throw an exception whenself._cancellations.discard(compat.current_task(self._loop))is called - The code in
_extract_stack()has a couple of issues:- Passing an iterator to
StackSummary.extract()but it expects a generator -
__path__does not exist on theasyncpgmodule
- Passing an iterator to
- In Python 3.7+,
-
connect_utils.pyhas several errors that relate to the code in_parse_connect_dsn_and_args()assigning new types to variables originally declared as another type. I can try to clean this up to make it more type-safe, or just add# type: ignore. -
cluster.py:- There are a couple of places where a
bytesis being passed to a formatting string which mypy recommends using!rwith those - In
Cluster.connect(),self.get_connection_spec()can returnNone, which would causeconn_info.update()to throw an error. Is this desired? - A similar issue can be found in
Cluster._test_connection()withself._connection_addr
- There are a couple of places where a
References #569, #387
One thing to note is that the handling of Record isn't quite what I'd like it to be yet. Currently, a user could do something like this:
class MyRecord(asyncpg.Record):
id: int
name: str
created_at: datetime.datetime
r: MyRecord = ...
reveal_type(r.id) # int
The user will only use this class for type checking and can use attribute access with type checking, but not lookup notation. A mypy plugin can probably be written to handle the lookup notation (TypedDict has something similar), so this isn't a big issue. The bigger issue is the following will be an error:
records: typing.List[MyRecord] = await conn.fetch('SELECT ... from ...')
It's an error because fetch() is typed to return List[Record], which doesn't match List[MyRecord] (List is invariant, so it has to match exactly). There are two options:
- Return a bound generic from every function that returns a
Record. This mostly works, but it requires the user to type their variables, even if they're not using a custom class:
record: MyRecord = await conn.fetchrow('SELECT ... FROM ...')
records: typing.List[MyRecord] = await conn.fetch('SELECT ... FROM ...')
cursor: asyncpg.cursor.CursorFactory[MyRecord] = await conn.cursor('SELECT ... FROM ...')
record2: asyncpg.Record = await conn.fetchrow('SELECT ... FROM ...')
records2: typing.List[asyncpg.Record] = await conn.fetch('SELECT ... FROM ...')
cursor2: asyncpg.cursor.CursorFactory[asyncpg.Record] = await conn.cursor('SELECT ... FROM ...')
- Add a
record_classparameter to methods to any function that returns aRecord. This, in my opinion, works the best but adds an unused (code-wise) parameter:
# the variables below have the same types as the last code block,
# but the types are inferred instead
record = await conn.fetch('SELECT ... FROM ...', record_class=MyRecord)
records = await conn.fetch('SELECT ... FROM ...', record_class=MyRecord)
cursor = await conn.cursor('SELECT ... FROM ...', record_class=MyRecord)
record2 = await conn.fetchrow('SELECT ... FROM ...')
records2 = await conn.fetch('SELECT ... FROM ...')
cursor2 = await conn.cursor('SELECT ... FROM ...')
Note that while mypy will see record as a MyRecord, it will be an instance of Record at runtime because record_class is only used to infer the type of the return.
I have the second option coded locally, but I wanted to check if adding that extra record_class parameter for type-checking purposes is OK.
There's a third option: lie about the return type and say it's a Sequence[Record] instead. I don't think the results of fetch() are mutated that often (why would you?) so this shouldn't be much of an issue.
Typing it as Sequence[Record] kind of works, but you'd still need a cast to go from Sequence[Record] to Sequence[MyRecord]. Combining Sequence with option 2 is better than List (since Sequence is covariant), but it still requires adding type information for all of your variables storing the results from functions returning Record.
but you'd still need a cast
Right. Well, my concern with the record_class parameter is that it could be very confusing to unknowing users. If we named it record_class, then one might rightfully assume that the returned records will be instances of that class, when, in fact, they wouldn't be.
You'd mentioned that a plugin is necessary to handle subscript access, so perhaps said plugin can also deal with the fetch calls (via a function hook). The plugin would only need to check that the lval type is a covariant sequence of Record and adjust the return type of fetch() to it to make mypy happy.
Right. Well, my concern with the
record_classparameter is that it could be very confusing to unknowing users.
I understand the concern and share your concern. Would record_type be less confusing?
The plugin would only need to check that the lval type is a covariant sequence of
Recordand adjust the return type offetch()to it to make mypy happy.
I'll have to do some digging to see if this is possible. If it is, I agree that it would probably be the best solution.
I've done quite a bit more work on the mypy plugin and have the following working:
import asyncpg
import datetime
import typing
import typing_extensions
class MyRecord(asyncpg.Record):
foo: int
bar: typing.Optional[str]
class MyOtherRecord(MyRecord):
baz: datetime.datetime
async def main() -> None:
conn = await asyncpg.connect(...)
m = typing.cast(MyRecord, await conn.fetchrow('SELECT foo, bar FROM records'))
o = typing.cast(MyOtherRecord, await conn.fetchrow('SELECT foo, bar, baz FROM other_records'))
key = 'baz'
fkey: typing_extensions.Final = 'baz'
reveal_type(m['foo']) # int
reveal_type(m['bar']) # Optional[str]
reveal_type(m['baz']) # error: "MyRecord" has no key 'baz'
reveal_type(m[0]) # int
reveal_type(m[1]) # Optional[str]
reveal_type(m[2]) # error: "MyRecord" has no index 2
reveal_type(m.get('foo')) # int
reveal_type(m.get('bar')) # Optional[str]
reveal_type(m.get('baz')) # error: "MyRecord" has no key 'baz'
reveal_type(m.get('baz', 1)) # Literal[1]
reveal_type(m.get(key, 1)) # Union[Any, int]
reveal_type(m.get(fkey, 1)) # Literal[1]
reveal_type(o['foo']) # int
reveal_type(o['bar']) # Optional[str]
reveal_type(o['baz']) # datetime
reveal_type(o[0]) # int
reveal_type(o[1]) # Optional[str]
reveal_type(o[2]) # datetime
reveal_type(o.get('foo')) # int
reveal_type(o.get('bar')) # Optional[str]
reveal_type(o.get('baz')) # datetime
reveal_type(o.get('baz', 1)) # datetime
reveal_type(o.get(key, 1)) # Union[Any, int]
reveal_type(o.get(fkey, 1)) # datetime
Based on the implementation of asyncpg.Record, I believe I have the typing for __getitem__() and get() correct. I tried to get the typings for Record to be as similar to TypedDict as possible (given the implementation differences). You'll notice that when the key can be determined by the type system, get() with a default argument is deterministic (ex. o.get('baz', 1) and o.get(fkey, 1)), otherwise it returns a Union. One thing I'd like to possibly try is to come up with a metaclass that would act like a typing_extensions.Protocol with runtime checking so instanceof() could be used to determine if a Record matched. At this time, I haven't attempted it.
I also did a lot of poking around to try and get the plugin to set the return type based on the variable it is being assigned to, but I don't see a way that it's possible. This means we're left with two options:
- Force users to cast to subclasses of
asyncpg.Record(as seen in the examples above). This means types likePreparedStatementwould need to be exposed fromasyncpgso users could easily cast the result ofConnection.prepare()toasyncpg.PreparedStatement[MyRecord]:
stmt = typing.cast(asyncpg.prepared_stmt.PreparedStatement[MyRecord], await conn.prepare('SELECT ... FROM ...'))
reveal_type(stmt) # asyncpg.prepared_stmt.PreparedStatement[MyRecord]
- Add an unused parameter to indicate to the type system which type should be inferred (and if it's left off, the return type would be
asyncpg.Record). I suggestedrecord_classabove, but I thinkreturn_typewould be less prone to users thinking the result would be a different class. This approach would mean that the result of calls to functions likeprepare()wouldn't need to be cast to a subscript and the type system would infer the subscript:
stmt = await conn.prepare('SELECT ... FROM ...', return_type=MyRecord)
reveal_type(stmt) # asyncpg.prepared_stmt.PreparedStatement[MyRecord]
There's also a possible third option if the Protocol-like class actually works: require users to use isinstance() to narrow the type:
stmt = await conn.prepare('SELECT ... FROM ...')
reveal_type(stmt) # asyncpg.prepared_stmt.PreparedStatement[asyncpg.protocol.protocol.Record]
record = await stmt.fetchrow(...)
reveal_type(record) # Optional[asyncpg.protocol.protocol.Record]
assert isinstance(record, MyRecord)
reveal_type(record) # MyRecord
The third option completely depends on whether the Protocol-like class is feasible, and would also do runtime checking (which would check if Record.keys() has all of the keys of the subclass). I would imagine the runtime checking would be a performance hit.
Awesome work on the plugin, @bryanforbes! Thanks!
Add an unused parameter to indicate to the type system which type should be inferred
I'm still a bit uneasy with adding unused parameters just for the typing purpose. That said, if we had record_class= actually make fetch() and friends return instances of that class, that would be a great solution. There's actually #40, which quite a few people requested before.
@elprans Thanks! There are still some places in the code where # type: ignore is being used. I've updated them to use the specific code(s) that can be used to ignore them (so if new type errors arise, mypy doesn't ignore those as well). Some of them can be ignored. For example, anywhere TypedDict is being used to ensure a dictionary has the right shape, mypy will complain when using ** on it. Others indicate a possible issue with the code that I wasn't sure how to fix. I'll list those here (using the latest commit I've just pushed):
-
connection.pyline 1264: All of the methods use_protocolwithout checking if it'sNone, so I typed it asBaseProtocol. However, when the connection is abortedNoneis assigned to_protocolso_protocolshould technically be typed asOptional[BaseProtocol]and be checked everywhere it's used (with a descriptive error) so the methods of theConnectiondon't throw a strange error about an internal member if they're used after aborting the connection. -
connection.pyline 1357:compat.current_asyncio_task()can returnNone(because the standard library can),_cancellationsis typed asSet[asyncio.Task[Any]], sodiscard()rejects passingNoneto it. -
cluster.pylines 129, 547, 604:mypygives the following error:On Python 3 '{}'.format(b'abc') produces "b'abc'"; use !r if this is a desired behavior. This probably should be updated to use!r, but I wasn't sure. -
cluster.pyline 136:get_connection_spec()can returnNonewhich would throw an error about an internal variable beingNone. This should probably be checked to see if it'sNone. -
cursor.pyline 169:_statecan beNone, which would cause an exception to be raised here.
With regards to the unused parameter, I completely understand your concern. I also think that making fetch() and friends return the instances of that class would be a good solution. I can look into that, but my experience with cpython is fairly limited (I've used cython a long long time ago, so it's much more familiar). Let me know how you'd like to proceed.
@bryanforbes PR #599 was merged! I don't know if it was a blocker for this PR or not (by reading the discussion here I think it was). Thanks for this PR, really, looking forward to it! :smile:
@victoraugustolls I'll rebase and update this PR today
@victoraugustolls I finished the rebase, but there's an issue with Python 3.6 and ConnectionMeta + Generic related to https://github.com/python/typing/issues/449. I'll poke around with it this weekend and try to get it working.
Thanks for the update @bryanforbes ! I will try and search for something that can help
@victoraugustolls I was able to come up with a solution for the metaclass issue in Python 3.6 (that won't affect performance on 3.7+), but I'd like to work on some tests before taking this PR out of draft. Feel free to review the code, though.
@victoraugustolls I started writing unit tests last night, however they use py.test and pytest-mypy-plugins to allow testing the output for both errors and revealed types. However, although py.test is a dev extra, I noticed you aren't using it to run the test suite. Will it be a problem to use py.test to only run the mypy unit tests?
@bryanforbes I don't think we need more than running mypy like we do flake8 in test__sourcecode.py for the main regression testsuite. Here's an example
Hi @bryanforbes ! I'm not a maintainer, but thanks for asking! I agree with @elprans and running mypy should be enough. If mypy is happy it should be fine!
@elprans Since I added a plugin, it's good practice to test the usage of that plugin. For now, I can just do the mypy run and then figure out a low-cost way to test the plugin later.
I added the unit test to do the mypy check and I also did some further cleanups. I think this is ready for review.
@bryanforbes
Since I added a plugin, it's good practice to test the usage of that plugin.
It should be easy to do by making a folder with some Python modules and a mypy.ini referencing the plugin, and then running mypy on the folder.
I think this is ready for review.
Great! I'll take a look tomorrow.
@elprans
It should be easy to do by making a folder with some Python modules and a
mypy.inireferencing the plugin, and then running mypy on the folder.
That may work. I'll give it a shot later today and let you know the outcome.
@elprans I added tests/typing_record.py and a unit test to run mypy on it which checks the output of that run. Let me know if you need that test file renamed.
@elprans in order to take care of the suggestion by @HarrySky, MagicStack/py-pgproto#9 needs approved and merged
Done
@bryanforbes, 0.22 is out, so it is a good time to rebase and get this merged.
What's the status on this?
@elprans @bryanforbes ? 🙂
I need to rebase on the latest master. I may have time to do that in the next month.
The work done here is epic!
Although it seems that the PR is too huge to get merged. What if we split it into smaller parts and merge quickly? Even a small, non-comprehensive portion of type hints could be useful if delivered. It could then be gradually improved one little step at a time.
I'm not sure if you guys are positive about the idea of merging this big PR in few smaller parts, but I was able to extract one part of this gorgeous work fairly easily. If it looks good enough we could merge it even without any additional type hints: #777. Dropping Python 3.5 is a good thing to do anyway. And also it could allow us to read what will be left here (type hints) more thoroughly without extra clutter.
@bryanforbes @elprans
I'm not sure if you guys are positive about the idea of merging this big PR in few smaller parts
I'm not opposed to doing that, but it would mean that the final step would be adding type checking to the tests so CI doesn't fail.
but I was able to extract one part of this gorgeous work fairly easily
Great work! Thank you.
And also it could allow us to read what will be left here (type hints) more thoroughly without extra clutter.
Great point. I'll look forward to seeing this PR cleaned up once yours has landed :).