arctic icon indicating copy to clipboard operation
arctic copied to clipboard

DataFrames with Categorical dtypes are always pickled

Open dimosped opened this issue 7 years ago • 0 comments

Arctic Version

1.67.1

Arctic Store

# VersionStore

Description of problem and/or code sample that reproduces the issue

Categorical dtype have been introduced since Pandas 0.15: http://pandas.pydata.org/pandas-docs/version/0.15/categorical.html

VersionStore's DataFrame serializer can't handle this type:

import pandas as pd
from arctic.serialization.numpy_records import DataFrameSerializer

df = pd.DataFrame({'cat_type': pd.Series(list("ABC")).astype('category')})
print df.dtypes['cat_type']  # prints category
ser = DataFrameSerializer()
recs = ser._to_records(df)

blows up:

Traceback (most recent call last):
  File "interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-438cd826562e>", line 7, in <module>
    recs = ser._to_records(df)
  File "arctic/arctic/serialization/numpy_records.py", line 135, in _to_records
    forced_dtype=forced_dtype if forced_dtype is None else forced_dtype[name]))
  File "arctic/arctic/serialization/numpy_records.py", line 32, in _to_primitive
    if arr.dtype.hasobject:
AttributeError: 'CategoricalDtype' object has no attribute 'hasobject'

Even if avoid the exception by checking whether hasobject property exists, it is still not possible to convert DF->rec array->DF and maintain the Categorical dtype, as it only exists for Pandas, not numpy: https://pandas.pydata.org/pandas-docs/stable/categorical.html#categorical-is-not-a-numpy-array

import pandas as pd
df = pd.DataFrame({'cat_type': pd.Series(list("ABC")).astype('category')})
assert df.dtypes['cat_type'] == df.from_records(df.to_records()).dtypes['cat_type']

Special handling of such DF dtypes is required, possibly storing metadata about the original DF dtype, and convert back to categorical dtype the column(s) before returning the read DF to the user.

dimosped avatar Aug 02 '18 12:08 dimosped