jupyterlab-h5web
jupyterlab-h5web copied to clipboard
region reference attributes are not supported
Attempting to open a dataset with a region reference attribute results in the following error:
TypeError: Object of type RegionReference is not JSON serializable
Steps to reproduce
Create the following HDF5 file:
import h5py
import numpy as np
with h5py.File("region_bug.h5", "w") as h5:
data = h5.create_dataset("data", data=np.ones((3,3)))
h5.attrs["dataref"] = data.regionref[0]
Attempt to open region_bug.h5
in Jupyter Lab
Thanks for the report ! This is a back-end issue but we can do something about it.
I did not use RegionReference
myself so I would need a bit more info.
Would you want the attribute value to be visible ? If so, what should it show ? Or, would it be acceptable to skip completely the attribute ?
I think ideally the attribute should show the referenced value, so for example
with h5py.File("region_bug.h5", "w") as h5:
data = h5.create_dataset("data", data=np.array([1, 2, 42]))
h5.attrs["answer"] = data.regionref[2]
Should be displayed as
answer 42
Though it should probably also indicate that it's a reference - the output of
h5dump region_bug.h5
is
HDF5 "region_bug.h5" {
GROUP "/" {
ATTRIBUTE "answer" {
DATATYPE H5T_REFERENCE { H5T_STD_REF_DSETREG }
DATASPACE SCALAR
DATA {
(0): DATASET /data {(2)-(2)}
}
}
DATASET "data" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 3 ) / ( 3 ) }
DATA {
(0): 1, 2, 42
}
}
}
}
So maybe
answer (ref /data {(2)-(2)}) 42
Since the region reference can refer to any slicing in a dataset, displaying the values in the "Inspect" tab can be problematic.
This issue affects h5py.Reference
as well as h5py.RegionReference
. I encountered this when using HDF5 documents that make use of the scales feature, which implements some references between datasets. Here's a short script to reproduce the error:
from jupyterlab_h5web import H5Web
import h5py
import numpy as np
# Generate data
x = np.linspace(-1, 1, 11)
y = np.linspace(-1.5, 1.5, 31)
xx, yy = np.meshgrid(x, y, indexing='ij')
mydata = np.exp(-np.sqrt(xx**2 + yy**2))
filename = 'Test.h5'
with h5py.File(filename, 'w') as f:
# Add datasets to file
data_dset = f.create_dataset(name='mydata', data=mydata)
x_dset = f.create_dataset(name='x', data=x)
y_dset = f.create_dataset(name='y', data=y)
# Make x and y into scales
x_dset.make_scale(name='x')
y_dset.make_scale(name='y')
# Attach scales to dataset
data_dset.dims[0].attach_scale(x_dset)
data_dset.dims[1].attach_scale(y_dset)
H5Web(filename)
Running this generates the following error:
TypeError: Type is not JSON serializable: h5py.h5r.Reference
In the meantime, one can modify AttributeHandler.get_content()
to ignore the problematic entries, but I think a more general solution would be appropriate
Thank you for giving additional information ! We are not using these features ourselves so it was hard to gain insight on how to solve the problem.
I will have a second look at this
v0.0.12 of the extension makes it compatible with h5grove latest version v0.0.11 (the package serving HDF5 contents). This version no longer crash on Reference
but converts them as strings.
You can try by updating the extension. This should install h5grove
latest version (this can be ensured by running pip install -U h5grove
afterwards).
I concur that converting references as strings does not give much useful info so I will try later to implement something more involved to at least give the name of the dataset it refers to.
Thank you for the update and for looking into this! I tested it and it to work fine. I agree that converting to string isn't ideal, but it's a good enough fix for now for preventing crashes. Thanks again!
I concur that converting references as strings does not give much useful info so I will try later to implement something more involved to at least give the name of the dataset it refers to.
I remember investigating this a while ago and it did not amount of anything because the reference needs outside information to be resolved. It was some weird stuff like it needs to have the file id
of the entity it points to, which I don't know how to deduce consistently.
I am afraid we will have to do with strings for now.