pandera icon indicating copy to clipboard operation
pandera copied to clipboard

Pyright error for custom data type: "Expected type expression but received "_DataTypeClass[Unknown]..."

Open alex-wenzel opened this issue 1 year ago • 1 comments
trafficstars

Describe the bug

I am trying to construct a custom Data Type along the lines of this Boolean example. My goal is to take a raw data column that contains strings in the format "HH:MM:SS" and represent them as integers instead using the coerce() function.

import pandera as pa
import pandas as pd
from pandera import dtypes
from pandera.engines import pandas_engine
from pandera.typing import DataFrame, Series

@pandas_engine.Engine.register_dtype(
	equivalents=["int", pd.Int64Dtype, pd.Int64Dtype()]
)
@dtypes.immutable
class Clocktime(pandas_engine.INT64):
	def coerce(
		self,
		series: pd.Series
	) -> pd.Series:
		raise NotImplementedError

I would expect this code to pass all type checks, but I have the following from Pyright, which highlights the text pandas_engine.INT64 in the class definition:

Expected type expression but received "_DataTypeClass[Unknown] | ((_DataTypeClass[Unknown]) -> _DataTypeClass[Unknown])" "(_DataTypeClass[Unknown]) -> _DataTypeClass[Unknown]" is not a class

I'm not sure whether this is a Pyright or Pandera bug, I'm happy to submit it to Pyright instead if you think it belongs there.

Relevant versions

  • Python: 3.10.12

  • Pandas: 2.2.2

  • Pandera: 0.20.1

  • LSP-Pyright (Sublime): 1.4.6

  • Pyright: 1.1.370

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandera.

  • [ ] (optional) I have confirmed this bug exists on the main branch of pandera.

alex-wenzel avatar Jul 09 '24 22:07 alex-wenzel

As an update, using pd.Int64Dtype rather than pandas_engine.INT64 (see below) passes the type checker, but I haven't run anything with it so I don't know if it's functional.

@pandas_engine.Engine.register_dtype(equivalents=["int", pd.Int64Dtype, pd.Int64Dtype()])
@dtypes.immutable
class Clocktime(pd.Int64Dtype):  ## No pyright error here
	def coerce(
		self,
		series: Series[str]
	) -> Series[int]:
		return cast(Series[int], series.map(lambda x: clocktime_to_int(x)))

alex-wenzel avatar Jul 10 '24 18:07 alex-wenzel

Also seeing this issue. It's quite a bummer, since a huge number of my files now have type errors and red-squiggles all over them now.

It seems to be due to the type of the @immutable decorator. Note the pyright error doesn't occur for Dtypes passing a kwargs to @immutable.

cmditch avatar Dec 10 '24 17:12 cmditch

It looks like the error has actually been fixed here. I think think we can close this issue now.

cmditch avatar Dec 10 '24 18:12 cmditch

Yep, looks like it, thanks for the heads up! Closing since fixed by #1823 .

alex-wenzel avatar Dec 10 '24 18:12 alex-wenzel