jhdf icon indicating copy to clipboard operation
jhdf copied to clipboard

Add support for reading slices of data

Open jamesmudd opened this issue 6 years ago • 7 comments

It would be nice to be able to read subsets of the datasets, and probably to offer iterators through datasets returning slices. The way to specify slices need to be figured out.

jamesmudd avatar Feb 20 '19 09:02 jamesmudd

We are using this library in our project. Now the requirement is to read large datasets. It would be of great help if this slicing feature is available in it. Do you have any plans to release slicing feature in the near future?

slathi18 avatar Mar 10 '22 07:03 slathi18

Thanks for the comment. I would really like to add this feature and I don't think it would be too much work. Unfortunately this is a spare time project for me so I can't really commit to a time scale. I would try to take a look in the next week and see how quick this could be.

jamesmudd avatar Mar 10 '22 21:03 jamesmudd

I have had a look at this and have a WIP branch https://github.com/jamesmudd/jhdf/tree/slicing-support I actually think adding basic slicing support will not be very long task. You didn't say whay type of datasets you wanted to slice currently I am looking at implementing contiguous, then adding chunked slicing support would be another task. I still can't commit to a timescale but would like to release this as soon as possible and will update the ticket with progress.

jamesmudd avatar Mar 15 '22 20:03 jamesmudd

There is now a PR #361 which adds support for slicing of contiguous datasets. Here is a jar jhdf-0.6.6-slice-beta.zip with built from the PR (needs to be renamed .zip.jar to workaround Github file restrictions)

@slathi18 you didn't mention the type of datasets you wanted to slice so not sure if this support is enough for your use case or not, if you give the jar a try would be great to get feedback.

There is a new method Dataset#getData(long[] offset, int[] sliceDimensions) which allows you to specify a slice you would like to take.

The PR still needs more tests and docs then need to look at chunked datasets.

jamesmudd avatar Mar 22 '22 21:03 jamesmudd

Thanks for this quick change. I would like to slice the large hdf files which will be around 11GB in size and read the project information resides within it. So I want to slice this big file in chunks so that it won't take a long time to load this huge file.

slathi18 avatar Mar 23 '22 10:03 slathi18

@slathi18 are the datasets in your files contiguous or chunked? If they are contiguous then this jar might already work for you.

jamesmudd avatar Mar 23 '22 11:03 jamesmudd

https://github.com/jamesmudd/jhdf/pull/361 adds support for contiguous datasets. This is released in v0.6.6. Support for chunked datasets still needs to be implemented.

jamesmudd avatar Apr 05 '22 20:04 jamesmudd