h5pyd
h5pyd copied to clipboard
Cannot access compound dataset which contains array of enum
I am trying to access a dataset which contains an enum array via h5serv, however h5pyd throws the following exception:
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/group.py", line 335, in __getitem__
tgt = getObjByUuid(link_json['collection'], link_json['id'])
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/group.py", line 311, in getObjByUuid
tgt = Dataset(DatasetID(self, dataset_json))
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/dataset.py", line 416, in __init__
self._dtype = createDataType(self.id.type_json)
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/h5type.py", line 725, in createDataType
dt = createDataType(field['type']) # recursive call
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/h5type.py", line 732, in createDataType
dtRet = createBaseDataType(typeItem) # create non-compound dt
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/h5type.py", line 638, in createBaseDataType
raise TypeError("Array Type base type must be integer, float, or string")
TypeError: Array Type base type must be integer, float, or string
We can create a minimal dataset to reproduce the error using h5py as follows:
import h5py
import numpy as np
f = h5py.File('test.h5', 'w')
enum_type = h5py.special_dtype(enum=('i', {"FOO": 0, "BAR": 1, "BAZ": 2}))
comp_type = np.dtype([('my_enum_array', enum_type, 10), ('my_int', 'i'), ('my_string', np.str_, 32)])
dataset = f.create_dataset("test", (4,), comp_type)
f.close()
We then put it in h5serv's data directory and try to access it:
import h5pyd
f = h5pyd.File("test.hdfgroup.org", endpoint="http://127.0.0.1:5000")
print(f['test'])
This yields the above exception. Note that we are able to access the dataset as expected using regular h5py.
Applying the following patch to h5pyd prevents the exception and returns a dataframe, however it doesn't seem to give the correct behavior (the enum array seems to be treated as an int array):
diff --git a/h5pyd/_hl/h5type.py b/h5pyd/_hl/h5type.py
index 4ce6cb4..10ce562 100644
--- a/h5pyd/_hl/h5type.py
+++ b/h5pyd/_hl/h5type.py
@@ -637 +637 @@ def createBaseDataType(typeItem):
- if arrayBaseType["class"] not in ('H5T_INTEGER', 'H5T_FLOAT', 'H5T_STRING'):
+ if arrayBaseType["class"] not in ('H5T_INTEGER', 'H5T_FLOAT', 'H5T_STRING', 'H5T_ENUM'):
I'm not sure how to properly proceed in working around this. Thanks in advance for your advice.
Hi, it looks like the test coverage for enum types is pretty thin - we'll want to beef this up.
I'm a bit confused just using h5py with your HDF5 file.
If I do this:
f = h5py.File("test.h5", 'r')
dset = f['test']
print(dset.dtype)
dt = dset.dtype["my_enum_array"]
print("enum dt: {}".format(dt))
print(h5py.check_dtype(enum=dt))
I'm getting "None" for the last output line. Is this what you see?
Yes, it seems that the metadata is lost if we access it that way. However, if I write f['test']['my_enum_array'].dtype.metadata
(or equivalently, h5py.check_dtype(enum=f['test']['my_enum_array'].dtype)
), the enum dictionary is retrieved as expected. This is pretty confusing behavior indeed.