dask icon indicating copy to clipboard operation
dask copied to clipboard

Type hints for read_bytes

Open douglasdavis opened this issue 3 years ago • 4 comments

  • [x] Passes pre-commit run --all-files

douglasdavis avatar Aug 29 '22 16:08 douglasdavis

CI failure unrelated

douglasdavis avatar Aug 29 '22 17:08 douglasdavis

@ian-r-rose thanks for taking a look!

douglasdavis avatar Aug 29 '22 18:08 douglasdavis

Sorry for the delay in getting back to this @douglasdavis!

Unfortunately, I think you still need to give the overloads defaults in order for type inference to work. For example, on this branch the following fails to type check:

from dask.bytes import read_bytes

reveal_type(read_bytes("path/to/stuff", include_path=True))

Dealing with literals in default keyword-arguments has been a persistently annoying thing in mypy (see the long discussion in https://github.com/python/mypy/issues/6580), but I do think it's doable using the special ellipsis symbol. It's not very well documented, but see here.

So I was able to get it working with this diff:

diff --git a/dask/bytes/core.py b/dask/bytes/core.py
index 0164d024d..8b783cba6 100644
--- a/dask/bytes/core.py
+++ b/dask/bytes/core.py
@@ -15,28 +15,28 @@ from dask.utils import is_integer, parse_bytes
 @overload
 def read_bytes(
     urlpath: str | list[str],
-    delimiter: bytes | None,
-    not_zero: bool,
-    blocksize: str | int | None,
-    sample: int | str | bool,
-    compression: str | None,
-    include_path: Literal[True],
+    delimiter: bytes | None = None,
+    not_zero: bool = False,
+    blocksize: str | int | None = None,
+    sample: int | str | bool = "10 kiB",
+    compression: str | None = None,
+    include_path: Literal[False] = False,
     **kwargs: Any,
-) -> tuple[bytes, list[list[Delayed]], list[str]]:
+) -> tuple[bytes, list[list[Delayed]]]:
     ...
 
 
 @overload
 def read_bytes(
     urlpath: str | list[str],
-    delimiter: bytes | None,
-    not_zero: bool,
-    blocksize: str | int | None,
-    sample: int | str | bool,
-    compression: str | None,
-    include_path: Literal[False],
+    delimiter: bytes | None = None,
+    not_zero: bool = False,
+    blocksize: str | int | None = None,
+    sample: int | str | bool = "10 kiB",
+    compression: str | None = None,
+    include_path: Literal[True] = ...,
     **kwargs: Any,
-) -> tuple[bytes, list[list[Delayed]]]:
+) -> tuple[bytes, list[list[Delayed]], list[str]]:
     ...
 

ian-r-rose avatar Sep 02 '22 18:09 ian-r-rose

Thanks, @ian-r-rose! This does some seem like quite the tough case for mypy 😛 with your diff I'm seeing this error from mypy (version: mypy 0.971 (compiled: yes)) if I run mypy dask/bytes/core.py:

dask/bytes/core.py:16: error: Overloaded function signatures 1 and 2 overlap with incompatible return types  [misc]

but the example code snippet you wrote does indeed work. I'm wondering how to make the mypy check on the dask codebase work. At this point I'd be OK with having an Any return typehint if this is getting too in the weeds

douglasdavis avatar Sep 06 '22 21:09 douglasdavis