eigen3-hdf5 icon indicating copy to clipboard operation
eigen3-hdf5 copied to clipboard

Extensible dataset

Open pwm1234 opened this issue 10 years ago • 5 comments

Much of my work involves processing time samples, so I need to be able to save data with an unlimited axis. Do you have thoughts or suggestions on adding this capability to eigen3-hdf5? If I get something reasonable would you want a pull request or do you want to keep your code free of this additional complexity?

pwm1234 avatar Oct 07 '15 04:10 pwm1234

What do you mean by unlimited axis? That the matrix (or vector) dimension will grow over time, and cannot be specified beforehand?

garrison avatar Oct 07 '15 04:10 garrison

Yes; its maximum dimension will be unlimited. I believe the correct HDF5 term is extendible dataset. Creating an Extendible Dataset is the basic tutorial, which says

[An unlimited dimension dataspace is specified with the H5Screate_simple / h5screate_simple_f call, by passing in H5S_UNLIMITED as an element of the maxdims array.

What I need is a dataset where each row of data represents the set of measurements for a slice in time, where I do not know in advance how many rows I will need. So the first dimension will be unlimited. For example, a typical IMU measurement consists of 6 doubles. I would need to store each measurement as a row in a 7 column matrix [time, ax, ay, az, gx, gy, gz].

pwm1234 avatar Oct 07 '15 11:10 pwm1234

I think the most useful thing would be able to save an eigen matrix to some subportion of an hdf5 array on disk. That way, you could store only the recent samples, adjust the size of the hdf5 array, and save the recent samples to disk.

garrison avatar Oct 16 '15 02:10 garrison

But that is the point of an extendable HDF5 dataset--the library does all of this for you. For example, if I have a Vector3d that I want to write out as time marches on, then I have a dataset that is UNLIMITEDx3. I write a vector at a time. The library takes care of adjusting the actual size on disk. If you do this with fixed size datasets, then you have to overallocate and use a fill value, write each row, expand whenever you reach the current size, copy to a new dataset if you exceed the max size, and shrink to actual size when you are finished.

I will play with it and give you a pull request when I have something that works for me.

pwm1234 avatar Oct 16 '15 05:10 pwm1234

The key missing functionality, though, is the ability to save an eigen array to a sub-porton of a dataset. With this, it would take just two calls to do what you are describing--one to extend the dataset, and the other to write the data. Once these calls exist (and the second call likely already exists in the hdf5 library), it is easy to make a wrapper function that does both at once.

garrison avatar Oct 17 '15 16:10 garrison