glue
glue copied to clipboard
Pandas DataFrames with type == 'object' cannot be save/restored
Describe the bug
Pandas DataFrames created within glue and added to the data_collection manager may have columns of type 'object', which mean they cannot be save/restored by glue (glue.core.state._load_numpy
calls np.load()
without allow_pickle=True
). This is generally not a problem when reading files using the Pandas data_factory (which converts columns), but does, for instance cause problems for datasets retrieved from external sources within a glue session.
To Reproduce Steps to reproduce the behavior such as:
- Create a Pandas DataFrame within glue and add it to the data_collection. For instance, one might use the process described in the documentation
df1 = DataFrame()
df1['a'] = [1.2, 3.4, 2.9]
df1['g'] = ['r', 'q', 's']
dc['dataframe'] = df1
- Save Session (this new Data object will be stored as a numpy array within the session file since it did not come from an external file)
- Restore Session
- Get the following error:
value error: 'Object arrays cannot be loaded when allow_pickle=False'
Expected behavior Pandas objects created within glue should not break session files.
We could simply add allow_pickle
to np.load()
, but perhaps this has undesired side effects?
Details:
- Operating System: MacOS 12.6
- Python version Python 3.9
- Glue version 1.6
- How you installed glue: conda
Additional context Sample session file attached: pandas_dataframe_session.glu.gz