jhdf
jhdf copied to clipboard
Unable to read dataset larger than Integer.MAX_VALUE bytes
Exception: Failed to map data buffer for dataset '/train'
at org.example.Texmex.lambda$main$3(Texmex.java:110)
at java.base/java.lang.Thread.run(Thread.java:1623)
Caused by: io.jhdf.exceptions.HdfException: Failed to map data buffer for dataset '/train'
at io.jhdf.dataset.ContiguousDatasetImpl.getDataBuffer(ContiguousDatasetImpl.java:44)
at io.jhdf.dataset.DatasetBase.getData(DatasetBase.java:133)
at org.example.Texmex.computeRecallFor(Texmex.java:70)
at org.example.Texmex.lambda$main$3(Texmex.java:108)
... 1 more
Caused by: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at java.base/sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:1185)
at io.jhdf.storage.HdfFileChannel.mapNoOffset(HdfFileChannel.java:74)
at io.jhdf.storage.HdfFileChannel.map(HdfFileChannel.java:66)
at io.jhdf.dataset.ContiguousDatasetImpl.getDataBuffer(ContiguousDatasetImpl.java:40)
The dataset in question is 3848008288 bytes (http://ann-benchmarks.com/deep-image-96-angular.hdf5).
Thanks for raising this and providing a sample file. This is a limitation currently with contiguous datasets, it would be possible to split the mapping up and read contigious datasets more like chunked datasets are read. This in theory would be a nice way to parrelise the reading as well to gain performance.
In the meantime you could try using the slice reading. Using Dataset#getData(long[] sliceOffset, int[] sliceDimensions)
.
Some code like seems to work (it is definitly not optimal takes about 30 secs to read on my system)
public class ReadDataset {
public static void main(String[] args) {
try (HdfFile hdfFile = new HdfFile(Paths.get("/path/to/deep-image-96-angular.hdf5"))) {
Dataset dataset = hdfFile.getDatasetByPath("/train");
int[] dimensions = dataset.getDimensions();
float[][] data = (float[][]) Array.newInstance(dataset.getJavaType(), dimensions);
for (int i = 0; i < dimensions[0]; i++) {
data[i] = ((float[][]) dataset.getData(new long[]{i, 0}, new int[]{1, dimensions[1]}))[0];
}
System.out.println("Finished read");
}
}
}