rhdf5
rhdf5 copied to clipboard
Ability to copy data from one h5 to another
Hi @grimbough, thanks for creating such a useful package! I have really been trying to figure this out on my own but I figured that I would ask about this here. I am trying to copy groups from one hdf5 file to a completely new one like in h5py group copy https://docs.h5py.org/en/stable/high/group.html. From their documentation:
copy(source, dest, name=None, shallow=False, expand_soft=False, expand_external=False, expand_refs=False, without_attrs=False)
You can copy very fast datasets from different hdf5 files similar to H5Lcreate_external
. However its not a full copy and relies on the previous location. I thought maybe H5Lcopy
would be the solution but I cant get it to work eg
h5_1 <- "test1.h5"
h5_2 <- "test2.h5"
h5createFile(h5_1)
h5createFile(h5_2)
h5write(c(1,2,3), h5_1, "test")
H5Lcopy(
h5loc = H5Fopen(h5_1),
name = "test",
h5loc_dest = H5Fopen(h5_2),
name_dest = "test"
)
# Error in H5Lcopy(h5loc = H5Fopen(h5_1), name = "test", h5loc_dest = H5Fopen(h5_2), :
# HDF5. Links. Can't move object.
H5Lcopy(
h5loc = H5Fopen(h5_1),
name = "test",
h5loc_dest = H5Fopen(h5_1),
name_dest = "test2"
)
#Works
This feature is really nice because when you write out in parallel you can copy them very fast into a single hdf5 file to take advantage of parallel I/O. Currently I have to call h5py's version and was wondering if I am missing something! Please let me know if you need anything else.
Thanks!
Thank you for opening this issue.
Please note that the repository maintainer (@grimbough) is currently on parental leave until October 2022 and any response will take longer than usual.
Hi @jeffmgranja,
Thanks for the interest in the package, sorry it took me a while to get around to replying to this issue.
I think this will require me to make the H5Ocopy() function available in the package. I'll take a look at doing that and get back to you.
This should now be available in rhdf5 version 2.41.2.
library(rhdf5)
## create two example files and add a dataset "test" to the first file
h5_1 <- tempfile(fileext = ".h5")
h5_2 <- tempfile(fileext = ".h5")
h5createFile(h5_1)
h5createFile(h5_2)
h5write(c(1,2,3), h5_1, "test")
## Open file handles to both files
fid_1 <- H5Fopen(h5_1)
fid_2 <- H5Fopen(h5_2)
## We can copy a dataset inside the same file
H5Ocopy(h5loc = fid_1, name = "test", h5loc_dest = fid_1, name_dest = "test2")
## Or to a different file with the same name
H5Ocopy(h5loc = fid_1, name = "test", h5loc_dest = fid_2, name_dest = "test")
## if we want to create a new group hierarchy we have to provide a link creation property list
lcpl <- H5Pcreate("H5P_LINK_CREATE")
H5Pset_create_intermediate_group( lcpl, create_groups = TRUE )
H5Ocopy(h5loc = fid_1, name = "test", h5loc_dest = fid_2, name_dest = "/foo/baa/test_nested", lcpl = lcpl)
## tidy up
H5Pclose(lcpl)
H5Fclose(fid_1)
H5Fclose(fid_2)
Here's the output from running h5ls()
to confirm the new groups have been created.
## Check we now have groups 'test' and 'test2' in the first file
h5ls( h5_1 )
#> group name otype dclass dim
#> 0 / test H5I_DATASET FLOAT 3
#> 1 / test2 H5I_DATASET FLOAT 3
## Check we have a 'test_copy' at the root and nested in the second file
h5ls( h5_2 )
#> group name otype dclass dim
#> 0 / foo H5I_GROUP
#> 1 /foo baa H5I_GROUP
#> 2 /foo/baa test_nested H5I_DATASET FLOAT 3
#> 3 / test H5I_DATASET FLOAT 3
I've no idea how performant this approach is but hopefully it's useful.
As an aside, I'd recommend not using the H5Fopen()
calls as part of the argument. Each call returns a file handle, and if you don't assign that to an R variable there's no way to close it outside of calling h5closeAll()
.