pandas2
pandas2 copied to clipboard
supported dtypes
Obvious / currently supported see here xref #20
- integer
- unsigned integer
- float
- complex
- boolean
- datetime (ns)
- datetime w/tz (ns)
- timedelta (ns)
- period
- category
- var length string
Informational, may want to think about the desirability of adding later
- datetime/timedelta units (ms, us, D)
- quantity / unit
- decimal #14
- aggregate or nested (list/dict) #25
- flexible / void / object
- coordinates (geo)
- fixed length string
- bytes
- guids (2 x 8 bytes)
- interval
- IP addresses (both v4 and v6)
- fractions, storing numerator and denominator as integers
- union
- structured (xref to #25)
non-support (try to raise informative errors & point to ready-made solns)
- void (a variation on
object
) - non-supported combinations (e.g. arbitrary dtypes, though maybe a pre-defined union)
- different datetime precision & ranges (e.g. ms vs ns)
Under possible
- date with no time type (edit: on second thought, is this anything more than a
Period[D]
?)
Some of the more extreme:
- IP addresses (both v4 and v6)
- fractions, storing numerator and denominator as integers
@jreback wouldn't we just want a way to have user defined dtypes instead of hardcoding a limited list? Can Dynd help with this?
you certainly can have parameterized types. but completely generic types is a recipe for disaster.
what do you think is missing for primitive
/ logical
typing?
What do you mean by parameterized? What types can be parameterized and by what? The link is broken.
Sorry I'm a bit lost.
I'm thinking of having a column of distribution objects or linear models or agents with their own attributes.
that's much too high level - though potential for a another library to build on pandas type system is possible
we are taking about columns of primitives
paramterized are things like
datetime64[D]
gotcha.
@datnamer either way, pandas needs to have its own metadata implementation (see the logical/physical decoupling discussion in https://pydata.github.io/pandas-design/internal-architecture.html#logical-types-and-physical-storage-decoupling). We do not want to delegate metadata details to a third party library. Data structures and computation are another matter on a case by case basis (i.e. assuming a library conforms to our memory representation expectations, we can use its algorithms). The tight coupling between metadata (numpy dtypes), memory representation, and algorithms/computation is part of why we are in the current mess.
maybe thing about this: https://github.com/pydata/pandas/issues/3443, which is about nested dtypes in a single object. On another vein should think about a union
type (which is a essentially a restricted looking object
dtype); SFrame
has these.
+1 for sparse.
maybe including subtype in sparse and categorical is useful, like category[int64]