arctic
arctic copied to clipboard
DataFrames with Categorical dtypes are always pickled
Arctic Version
1.67.1
Arctic Store
# VersionStore
Description of problem and/or code sample that reproduces the issue
Categorical dtype have been introduced since Pandas 0.15: http://pandas.pydata.org/pandas-docs/version/0.15/categorical.html
VersionStore's DataFrame serializer can't handle this type:
import pandas as pd
from arctic.serialization.numpy_records import DataFrameSerializer
df = pd.DataFrame({'cat_type': pd.Series(list("ABC")).astype('category')})
print df.dtypes['cat_type'] # prints category
ser = DataFrameSerializer()
recs = ser._to_records(df)
blows up:
Traceback (most recent call last):
File "interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-438cd826562e>", line 7, in <module>
recs = ser._to_records(df)
File "arctic/arctic/serialization/numpy_records.py", line 135, in _to_records
forced_dtype=forced_dtype if forced_dtype is None else forced_dtype[name]))
File "arctic/arctic/serialization/numpy_records.py", line 32, in _to_primitive
if arr.dtype.hasobject:
AttributeError: 'CategoricalDtype' object has no attribute 'hasobject'
Even if avoid the exception by checking whether hasobject property exists, it is still not possible to convert DF->rec array->DF and maintain the Categorical dtype, as it only exists for Pandas, not numpy: https://pandas.pydata.org/pandas-docs/stable/categorical.html#categorical-is-not-a-numpy-array
import pandas as pd
df = pd.DataFrame({'cat_type': pd.Series(list("ABC")).astype('category')})
assert df.dtypes['cat_type'] == df.from_records(df.to_records()).dtypes['cat_type']
Special handling of such DF dtypes is required, possibly storing metadata about the original DF dtype, and convert back to categorical dtype the column(s) before returning the read DF to the user.