Do something more sensible with data from pandas
- unyt version:
v2.2.0+7.g5d3ace5' - Python version: 3.6.8
- Operating System: Ubuntu 18.04
Description
If you apply units to a pandas dataframe you get back something that doesn't actually have any units:
In [1]: import unyt as u
data
In [2]: import pandas as pd
In [3]: data = pd.read_csv('/home/goldbaum/Documents/rc-co2monitor/co2data.csv')
In [4]: t = data['Temperature']*u.degC
In [5]: t.units
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-7e2982815421> in <module>
----> 1 t.units
~/.pyenv/versions/3.6.8/lib/python3.6/site--packages/pandas/core/generic.py in __getattr__(self, name)
5065 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5066 return self[name]
-> 5067 return object.__getattribute__(self, name)
5068
5069 def __setattr__(self, name, value):
AttributeError: 'Series' object has no attribute 'units'
In [6]: type(t)
Out[6]: pandas.core.series.Series
Adding full support for pandas data types may be a lot to ask for, in which case we should somehow detect whether we're handed a pandas series or dataframe (preferably without needing to actually import pandas) and then raise an error telling the user to convert data to numpy arrays first.
Another option, and a very light touch to unyt, is to register an accessor with pandas. I have prototyped this and usage looks like:
>>> import pandas as pd
>>> import unyt
>>> data = pd.DataFrame({"Temperature":[0.0, 23.0, 55.0]})
>>> data.Temperature.unyt.set_units("degC")
unyt_array([ 0., 23., 55.], 'degC')
Is this approach of interest?
I’d probably need to see more details on how this would work inside a pandas workflow. Feel free to open a PR but please do include some usage examples that demonstrate how this would be useful.
I’d also like it if we could avoid importing pandas (or at least delay importing pandas until it’s needed) as that would increase the import time cost for the whole library.