pandas2
pandas2 copied to clipboard
A "NULL" / "NA" logical type
There are a number of places where we "guess" a type (e.g. np.float64
) where there is no reasonable choice, e.g. in the CSV parser code.
Many databases have the notion of a "null" type which can be casted to any other type implicitly. For example, if a column in a DataFrame has null type, then you could cast it to float64 or string and obtain an equivalent column of all NA/null values. This would flow through to concat operations.
Figuring this out doesn't strike me as urgent but it would be good to assess how invasive this change would be (in theory it would help with some rough edges, but may well break user code where a "float64 column of all NaNs" was assumed before)
See here: https://github.com/pandas-dev/pandas/pull/15892#issuecomment-291641391
we have this de-facto now:
In [6]: Series([pd.NaT])
Out[6]:
0 NaT
dtype: datetime64[ns]
In [7]: Series([np.nan])
Out[7]:
0 NaN
dtype: float64
In [8]: Series([None])
Out[8]:
0 None
dtype: object
but this is really no good and should simply be a 'NULL' type (or maybe 'ANY') until / unless it is assigned / coerced.