rhdf5 icon indicating copy to clipboard operation
rhdf5 copied to clipboard

Ability to copy data from one h5 to another

Open jeffmgranja opened this issue 1 year ago • 3 comments

Hi @grimbough, thanks for creating such a useful package! I have really been trying to figure this out on my own but I figured that I would ask about this here. I am trying to copy groups from one hdf5 file to a completely new one like in h5py group copy https://docs.h5py.org/en/stable/high/group.html. From their documentation:

copy(source, dest, name=None, shallow=False, expand_soft=False, expand_external=False, expand_refs=False, without_attrs=False)

You can copy very fast datasets from different hdf5 files similar to H5Lcreate_external. However its not a full copy and relies on the previous location. I thought maybe H5Lcopy would be the solution but I cant get it to work eg

h5_1 <- "test1.h5"
h5_2 <- "test2.h5"
h5createFile(h5_1)
h5createFile(h5_2)
h5write(c(1,2,3), h5_1, "test")
H5Lcopy(
    h5loc = H5Fopen(h5_1), 
    name = "test", 
    h5loc_dest = H5Fopen(h5_2), 
    name_dest = "test"
)
# Error in H5Lcopy(h5loc = H5Fopen(h5_1), name = "test", h5loc_dest = H5Fopen(h5_2),  : 
#   HDF5. Links. Can't move object.
H5Lcopy(
    h5loc = H5Fopen(h5_1), 
    name = "test", 
    h5loc_dest = H5Fopen(h5_1), 
    name_dest = "test2"
)
#Works

This feature is really nice because when you write out in parallel you can copy them very fast into a single hdf5 file to take advantage of parallel I/O. Currently I have to call h5py's version and was wondering if I am missing something! Please let me know if you need anything else.

Thanks!

jeffmgranja avatar Aug 15 '22 04:08 jeffmgranja

Thank you for opening this issue.

Please note that the repository maintainer (@grimbough) is currently on parental leave until October 2022 and any response will take longer than usual.

github-actions[bot] avatar Aug 15 '22 04:08 github-actions[bot]

Hi @jeffmgranja,

Thanks for the interest in the package, sorry it took me a while to get around to replying to this issue.

I think this will require me to make the H5Ocopy() function available in the package. I'll take a look at doing that and get back to you.

grimbough avatar Oct 25 '22 13:10 grimbough

This should now be available in rhdf5 version 2.41.2.

library(rhdf5)

## create two example files and add a dataset "test" to the first file
h5_1 <- tempfile(fileext = ".h5")
h5_2 <- tempfile(fileext = ".h5")
h5createFile(h5_1)
h5createFile(h5_2)
h5write(c(1,2,3), h5_1, "test")

## Open file handles to both files
fid_1 <- H5Fopen(h5_1)
fid_2 <- H5Fopen(h5_2)

## We can copy a dataset inside the same file
H5Ocopy(h5loc = fid_1, name = "test", h5loc_dest = fid_1, name_dest = "test2")

## Or to a different file with the same name
H5Ocopy(h5loc = fid_1, name = "test", h5loc_dest = fid_2, name_dest = "test")

## if we want to create a new group hierarchy we have to provide a link creation property list
lcpl <- H5Pcreate("H5P_LINK_CREATE")
H5Pset_create_intermediate_group( lcpl, create_groups = TRUE )
H5Ocopy(h5loc = fid_1, name = "test", h5loc_dest = fid_2, name_dest = "/foo/baa/test_nested", lcpl = lcpl)

## tidy up
H5Pclose(lcpl)
H5Fclose(fid_1)
H5Fclose(fid_2)

Here's the output from running h5ls() to confirm the new groups have been created.

## Check we now have groups 'test' and 'test2' in the first file
h5ls( h5_1 )
#>   group  name       otype dclass dim
#> 0     /  test H5I_DATASET  FLOAT   3
#> 1     / test2 H5I_DATASET  FLOAT   3
## Check we have a 'test_copy' at the root and nested in the second file
h5ls( h5_2 )
#>      group        name       otype dclass dim
#> 0        /         foo   H5I_GROUP           
#> 1     /foo         baa   H5I_GROUP           
#> 2 /foo/baa test_nested H5I_DATASET  FLOAT   3
#> 3        /        test H5I_DATASET  FLOAT   3

I've no idea how performant this approach is but hopefully it's useful.


As an aside, I'd recommend not using the H5Fopen() calls as part of the argument. Each call returns a file handle, and if you don't assign that to an R variable there's no way to close it outside of calling h5closeAll().

grimbough avatar Oct 26 '22 12:10 grimbough