TileDB-R
TileDB-R copied to clipboard
Returning list structures for the same array and group metadata are not identical
Issue
Putting the same metadata on an array and group and then retrieving them back to R, the returning objects are equivalent but not identical.
For the case of retrieving all metadata :
- array getter returns a named list classed
tiledb_metadata(used for print method) - group getter returns a named list but not classed and each element has an attribute named
"key"
Is it intentional? I found no documentation or usage why the group metadata require an extra attribute on each element.
Here's a reproducible example:
R Code - reprex
library(tiledb) # version 0.30.2
# metadata for array and group
md <- list("a1" = 1, "b2" = 2)
nms <- names(md)
# Array metadata ------------------------
uri_arr <- tempfile("arr1")
fromDataFrame(data.frame(a = "foo"), uri_arr)
arr_handle <- tiledb_array(uri_arr)
arr_handle <- tiledb_array_open(arr_handle, type = "WRITE")
# Put metadata
status <- mapply(
key = nms,
val = md,
FUN = function(key, val) {tiledb_put_metadata(arr_handle, key, val)})
all(status) # check all OK
#> [1] TRUE
arr_handle <- tiledb_array_close(arr_handle)
arr_handle <- tiledb_array_open(arr_handle, type = "READ")
arr_metadata <- tiledb_get_all_metadata(arr_handle)
# Group metadata ------------------------
uri_grp <- tempfile("grp1")
grp <- tiledb_group_create(uri_grp)
grp <- tiledb_group(grp, type = "WRITE")
# Put metadata
status <- mapply(
key = nms,
val = md,
FUN = function(key, val) {tiledb_group_put_metadata(grp, key, val)})
all(status) # check all OK
#> [1] TRUE
grp <- tiledb_group_close(grp)
grp <- tiledb_group_open(grp, type = "READ")
grp_metadata <- tiledb_group_get_all_metadata(grp)
Results
# What ??? :(
all.equal(arr_metadata, grp_metadata)
[1] "Attributes: < names for target but not for current >"
[2] "Attributes: < Length mismatch: comparison on first 0 components >"
[3] "Component \"a1\": Attributes: < target is NULL, current is list >"
[4] "Component \"b2\": Attributes: < target is NULL, current is list >"
# OK
all.equal(arr_metadata, grp_metadata, check.attributes = FALSE)
[1] TRUE
# Object structure
str(arr_metadata)
List of 2
$ a1: num 1
$ b2: num 2
- attr(*, "class")= chr "tiledb_metadata"
str(grp_metadata)
List of 2
$ a1: num 1
..- attr(*, "key")= chr "a1"
$ b2: num 2
..- attr(*, "key")= chr "b2"
# Print to console
arr_metadata
a1: 1
b2: 2
grp_metadata
$a1
[1] 1
attr(,"key")
[1] "a1"
$b2
[1] 2
attr(,"key")
[1] "b2"
Comments/Notes/Fin
In practice, I do strip off the "key" attribute to get identical output structure which also helps in unit testing or mixing array and group metadata for whatever reason.
Other notes and observations:
-
The equivalent function of
tiledb_group_get_metadata_from_index()for array is not implemented inRbut exists inC++(tiledb:::libtiledb_array_get_metadata_from_index()) -
tiledb_group_get_all_metadata()is written inRwhereastiledb_get_all_metadata()in C++ (loop under the hood), seelibtiledb_array_get_metadata_list; not an issue other than memory efficiency but the implementation will be identical if you write it in C++ e.g.,libtiledb_group_get_metadata_list. -
Metadata related functions perhaps should get a roxygen tag
@family metadatathat will make it easier to navigate the vast documentation viaSee alsoauto generated links. -
Not vacuum/consolidation operations for group metadata
I hope the above were helpful towards a consistent metadata interface (structure, class, print method, functionality) :) .
Thanks
As a last note, it seems at C++ level the group getter is assigned 'key' attribute whereas 'names' for array although the code logic is identical.
libtiledb_array_get_metadata_from_index https://github.com/TileDB-Inc/TileDB-R/blob/c2ba622f7ca0e5bb448f127e0e113bcac277a486/src/libtiledb.cpp#L2878-L2879
libtiledb_group_get_metadata_from_index https://github.com/TileDB-Inc/TileDB-R/blob/c2ba622f7ca0e5bb448f127e0e113bcac277a486/src/libtiledb.cpp#L5434-L5435