fiftyone [FR] Add support for loading data in HDF5 format

[FR] Add support for loading data in HDF5 format

Open brimoor opened this issue 4 years ago • 2 comments

This was a request from a recent webinar.

One option to supporting this would be to ingest the data from HDF5 format and write images/other media to an internal FiftyOne directory in a standard individual file format at dataset creation time. This is analogous to how we support loading data in TFRecords format, for example.

There is an h5py package on github for working with HDF5 formatted data.

Sep 11 '20 13:09 brimoor

Any progress?

Aug 19 '22 15:08 oguz-hanoglu

We haven't had time to add "native" HDF5 support yet.

FiftyOne currently requires access to each individual image via the filepath field of each sample, which must be an image format that web browser's can display (png, jpg, tiff` -- possibly with a browser extension installed, etc.)

The way to work with HDF5 data currently would be to unpack it using h5py into regular images on disk so you can construct a FiftyOne dataset.

It would be awesome to have a custom importer contributed that would automate this unpacking, similar to how TF records can be imported, for example 🤗

Aug 19 '22 16:08 brimoor

Using the library you mentioned, unpacking an hdf5 is like:

HDF5_FILE = "data.h5"
with h5py.File(HDF5_FILE, 'r') as f:
    for img in f["images"]:
        cv2.imwrite("filename.png", img)

So, would it be useful if we simply

implement a foud.UnlabeledImageDatasetImporter?
take hdf5 file path and key("images" in the example) as input?
setup method includes a code piece similar to the one above?

The rest would be very similar to unlabeled version of this.

Nov 05 '22 14:11 oguz-hanoglu

Hello, I also encountered the same problem. Is there any progress? I have a large amount of image data, but read them from disk is very slow. It would be great if we can read image from formats such as HDF5 or LMDB.

Mar 04 '23 20:03 zero0kiriyu

fiftyone fiftyone copied to clipboard

[FR] Add support for loading data in HDF5 format

fiftyone
fiftyone copied to clipboard