fiftyone
fiftyone copied to clipboard
[FR] Add support for loading data in HDF5 format
This was a request from a recent webinar.
One option to supporting this would be to ingest the data from HDF5 format and write images/other media to an internal FiftyOne directory in a standard individual file format at dataset creation time. This is analogous to how we support loading data in TFRecords format, for example.
There is an h5py package on github for working with HDF5 formatted data.
Any progress?
We haven't had time to add "native" HDF5 support yet.
FiftyOne currently requires access to each individual image via the filepath
field of each sample, which must be an image format that web browser's can display (png,
jpg,
tiff` -- possibly with a browser extension installed, etc.)
The way to work with HDF5 data currently would be to unpack it using h5py
into regular images on disk so you can construct a FiftyOne dataset.
It would be awesome to have a custom importer contributed that would automate this unpacking, similar to how TF records can be imported, for example 🤗
Using the library you mentioned, unpacking an hdf5 is like:
HDF5_FILE = "data.h5"
with h5py.File(HDF5_FILE, 'r') as f:
for img in f["images"]:
cv2.imwrite("filename.png", img)
So, would it be useful if we simply
- implement a foud.UnlabeledImageDatasetImporter?
- take hdf5 file path and key("images" in the example) as input?
- setup method includes a code piece similar to the one above?
The rest would be very similar to unlabeled version of this.
Hello, I also encountered the same problem. Is there any progress? I have a large amount of image data, but read them from disk is very slow. It would be great if we can read image from formats such as HDF5 or LMDB.