polars icon indicating copy to clipboard operation
polars copied to clipboard

fix(python): use explicit type-arg for PythonDataType

Open josh opened this issue 2 years ago • 2 comments

Bumped into this issue trying to use pl.from_dicts. Type checkers in strict modes seem to not like SchemaDefinition -> PythonDataType using list and tuple without explicit args. I believe passing Any is essentially the default behavior. This just makes that explicit.

Edit

Switched to using typing.List and typing.Tuple for Python 3.7 compatible. This file also already contains a List class used for a different purpose.

josh avatar Jan 27 '23 04:01 josh

Switched to using typing.List and typing.Tuple for Python 3.7 compatible. This file also already contains a List class used for a different purpose.

This shouldn't be necessary, given that from __future__ import annotations is present?

alexander-beedie avatar Jan 27 '23 13:01 alexander-beedie

This shouldn't be necessary, given that from __future__ import annotations is present?

I tried at first, but CI failed with mypy erroring:

polars/datatypes.py:82: error: "list" is not subscriptable  [misc]
polars/datatypes.py:83: error: "tuple" is not subscriptable  [misc]

https://github.com/pola-rs/polars/actions/runs/4021457727/jobs/6910343984

josh avatar Jan 27 '23 16:01 josh

I'm having trouble reproducing your issue. Could you provide a minimum working example where the type checker fails?

stinodego avatar Feb 01 '23 20:02 stinodego

I'm having trouble reproducing your issue. Could you provide a minimum working example where the type checker fails?

Ah, I should I mentioned I'm using pyright not mypy.

# pyright: strict
import polars as pl
pl.from_dict({"a": [1, 2], "b": [3, 4]})
$ pyright example.py
example.py:3:1 - error: Type of "from_dict" is partially unknown
    Type of "from_dict" is "(data: Mapping[str, Sequence[object] | Mapping[str, Sequence[object]] | Series], schema: Sequence[str] | Mapping[str, DataTypeClass | DataType | Type[int] | Type[float] | Type[bool] | Type[str] | Type[date] | Type[time] | Type[datetime] | Type[timedelta] | Type[list[Unknown]] | Type[tuple[Unknown, ...]] | Type[bytes] | Type[Decimal]] | Sequence[str | Tuple[str, DataTypeClass | DataType | Type[int] | Type[float] | Type[bool] | Type[str] | Type[date] | Type[time] | Type[datetime] | Type[timedelta] | Type[list[Unknown]] | Type[tuple[Unknown, ...]] | Type[bytes] | Type[Decimal] | None]] | None = None, *, schema_overrides: Mapping[str, DataTypeClass | DataType] | None = None) -> DataFrame" (reportUnknownMemberType)
1 error, 0 warnings, 0 informations

More isolated example:

# pyright: strict
import polars.datatypes
polars.datatypes.PythonDataType
$ pyright example.py
example.py:4:1 - error: Type of "PythonDataType" is partially unknown
    Type of "PythonDataType" is "Type[Type[int]] | Type[Type[float]] | Type[Type[bool]] | Type[Type[str]] | Type[Type[date]] | Type[Type[time]] | Type[Type[datetime]] | Type[Type[timedelta]] | Type[Type[list[Unknown]]] | Type[Type[tuple[Unknown, ...]]] | Type[Type[bytes]] | Type[Type[Decimal]]" (reportUnknownMemberType)

josh avatar Feb 04 '23 23:02 josh

Sorry to leave this hanging for a while. I had to verify any false positives/negatives this could reproduce, as I knew I had looked at this before.

Since the type is object, I thought it might simply allow all objects now, but that is not the case:

Nested: TypeAlias = Union[
    Type[ListType[Any]],
    Type[Tuple[Any, ...]],
]

type_map: dict[Nested, str] = {
    list: "List!",
    tuple: "Tuple!",
    dict: "Dict!",  # error: Dict entry 2 has incompatible type "Type[int]": "str"; expected "Union[Type[List[Any]], Type[Tuple[Any, ...]]]": "str"  [dict-item]
}

So this solves a type: ignore comment and apparently pleases the pyright type checker. That's good enough for me, let's merge this!

Thanks @josh !

EDIT: As an aside:

Switched to using typing.List and typing.Tuple for Python 3.7 compatible. This file also already contains a List class used for a different purpose.

This shouldn't be necessary, given that from __future__ import annotations is present?

Type aliases are not affected in the same way as actual type annotations on functions/methods by the future import. I couldn't tell you the details, but I know I've bumped into it before. I believe you can work around this by using strings, but I believe this is more readable in this case.

stinodego avatar Feb 13 '23 09:02 stinodego