pandas2
pandas2 copied to clipboard
API: .to_numpy()
xref https://github.com/pandas-dev/pandas/issues/14052
currently we have an (implicity) numpy conversion when we access .values
of a 1D (Series). This mostly returns a numpy array, though we do return numpy-like objects several dtypes:
- categorical, we simply return a
Categorical
object - datetime tz/aware, we return a
datetime64[ns]
array in UTC (losing the tz)
This also has implications when we have a 2D object (DataFrame). we use a type that can safely hold all of the data:
- int & floats -> floats
- datetime w/tz -> object array
- object & anything -> object array
so generally this is ok for 2D in that you preserve as much as possible (though of course you must copy / return heavyweight object
array at times).
So need some though on how to make this api look & validate cases.
I would propose .to_numpy()
(a function, so we can potentially pass options). and it won't break the current API (which we can preserve I think / provide back-compat). w/o making libpandas
jump thru hoops to support the 'old' stuff.
I agree with this -- it would be helpful to start migrating away from the .values
API toward something more explicit to ease the burden. We might even want to introduce a logging layer into pandas 1.0 to alert users to use of "non-future proof" APIs
right I suppose could instrument things maybe via an option
pandas.options.future.logging='warn'|'raise'|'ignore'
and namespaced a bit for .future.*
options if we need.