jupyterlab-h5web icon indicating copy to clipboard operation
jupyterlab-h5web copied to clipboard

region reference attributes are not supported

Open planetmarshall opened this issue 3 years ago • 7 comments

Attempting to open a dataset with a region reference attribute results in the following error:

TypeError: Object of type RegionReference is not JSON serializable

Steps to reproduce

Create the following HDF5 file:

import h5py
import numpy as np

with h5py.File("region_bug.h5", "w") as h5:
    data = h5.create_dataset("data", data=np.ones((3,3)))
    h5.attrs["dataref"] = data.regionref[0]

Attempt to open region_bug.h5 in Jupyter Lab

planetmarshall avatar May 10 '21 11:05 planetmarshall

Thanks for the report ! This is a back-end issue but we can do something about it.

I did not use RegionReference myself so I would need a bit more info.

Would you want the attribute value to be visible ? If so, what should it show ? Or, would it be acceptable to skip completely the attribute ?

loichuder avatar May 10 '21 14:05 loichuder

I think ideally the attribute should show the referenced value, so for example

with h5py.File("region_bug.h5", "w") as h5:
    data = h5.create_dataset("data", data=np.array([1, 2, 42]))
    h5.attrs["answer"] = data.regionref[2]

Should be displayed as

answer 42

Though it should probably also indicate that it's a reference - the output of

h5dump region_bug.h5

is

HDF5 "region_bug.h5" {
GROUP "/" {
   ATTRIBUTE "answer" {
      DATATYPE  H5T_REFERENCE { H5T_STD_REF_DSETREG }
      DATASPACE  SCALAR
      DATA {
      (0): DATASET /data {(2)-(2)}
      }
   }
   DATASET "data" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
      DATA {
      (0): 1, 2, 42
      }
   }
}
}

So maybe

answer (ref /data {(2)-(2)}) 42

planetmarshall avatar May 10 '21 16:05 planetmarshall

Since the region reference can refer to any slicing in a dataset, displaying the values in the "Inspect" tab can be problematic.

t20100 avatar May 18 '21 12:05 t20100

This issue affects h5py.Reference as well as h5py.RegionReference. I encountered this when using HDF5 documents that make use of the scales feature, which implements some references between datasets. Here's a short script to reproduce the error:

from jupyterlab_h5web import H5Web
import h5py
import numpy as np

# Generate data
x = np.linspace(-1, 1, 11)
y = np.linspace(-1.5, 1.5, 31)
xx, yy = np.meshgrid(x, y, indexing='ij')
mydata = np.exp(-np.sqrt(xx**2 + yy**2))

filename = 'Test.h5'
with h5py.File(filename, 'w') as f:
    # Add datasets to file
    data_dset = f.create_dataset(name='mydata', data=mydata)
    x_dset = f.create_dataset(name='x', data=x)
    y_dset = f.create_dataset(name='y', data=y)
    
    # Make x and y into scales
    x_dset.make_scale(name='x')
    y_dset.make_scale(name='y')
    
    # Attach scales to dataset
    data_dset.dims[0].attach_scale(x_dset)
    data_dset.dims[1].attach_scale(y_dset)

H5Web(filename)

Running this generates the following error:

TypeError: Type is not JSON serializable: h5py.h5r.Reference

In the meantime, one can modify AttributeHandler.get_content() to ignore the problematic entries, but I think a more general solution would be appropriate

cherbon1 avatar Oct 25 '21 10:10 cherbon1

Thank you for giving additional information ! We are not using these features ourselves so it was hard to gain insight on how to solve the problem.

I will have a second look at this

loichuder avatar Oct 25 '21 11:10 loichuder

v0.0.12 of the extension makes it compatible with h5grove latest version v0.0.11 (the package serving HDF5 contents). This version no longer crash on Reference but converts them as strings.

You can try by updating the extension. This should install h5grove latest version (this can be ensured by running pip install -U h5grove afterwards).

I concur that converting references as strings does not give much useful info so I will try later to implement something more involved to at least give the name of the dataset it refers to.

loichuder avatar Nov 02 '21 16:11 loichuder

Thank you for the update and for looking into this! I tested it and it to work fine. I agree that converting to string isn't ideal, but it's a good enough fix for now for preventing crashes. Thanks again!

cherbon1 avatar Nov 02 '21 19:11 cherbon1

I concur that converting references as strings does not give much useful info so I will try later to implement something more involved to at least give the name of the dataset it refers to.

I remember investigating this a while ago and it did not amount of anything because the reference needs outside information to be resolved. It was some weird stuff like it needs to have the file id of the entity it points to, which I don't know how to deduce consistently.

I am afraid we will have to do with strings for now.

loichuder avatar Apr 19 '24 09:04 loichuder