TileDB-R icon indicating copy to clipboard operation
TileDB-R copied to clipboard

Returning list structures for the same array and group metadata are not identical

Open cgiachalis opened this issue 1 year ago • 1 comments

Issue

Putting the same metadata on an array and group and then retrieving them back to R, the returning objects are equivalent but not identical.

For the case of retrieving all metadata :

  • array getter returns a named list classed tiledb_metadata (used for print method)
  • group getter returns a named list but not classed and each element has an attribute named "key"

Is it intentional? I found no documentation or usage why the group metadata require an extra attribute on each element.

Here's a reproducible example:

R Code - reprex
library(tiledb) # version 0.30.2

# metadata for array and group
md <- list("a1" = 1, "b2" = 2)
nms <- names(md)

# Array metadata ------------------------

uri_arr <- tempfile("arr1")
fromDataFrame(data.frame(a = "foo"), uri_arr)
arr_handle <- tiledb_array(uri_arr)

arr_handle <- tiledb_array_open(arr_handle, type = "WRITE")

# Put metadata

status <- mapply(
  key = nms,
  val = md,
  FUN = function(key, val) {tiledb_put_metadata(arr_handle, key, val)})

all(status) # check all OK
#> [1] TRUE

arr_handle <- tiledb_array_close(arr_handle)
arr_handle <- tiledb_array_open(arr_handle, type = "READ")

arr_metadata <- tiledb_get_all_metadata(arr_handle)

# Group metadata ------------------------

uri_grp <- tempfile("grp1")
grp <- tiledb_group_create(uri_grp)
grp <- tiledb_group(grp, type = "WRITE")

# Put metadata
status <- mapply(
  key = nms,
  val = md,
  FUN = function(key, val) {tiledb_group_put_metadata(grp, key, val)})

all(status) # check all OK
#> [1] TRUE

grp <- tiledb_group_close(grp)
grp <- tiledb_group_open(grp, type = "READ")

grp_metadata <- tiledb_group_get_all_metadata(grp)

Results

# What ??? :(
all.equal(arr_metadata, grp_metadata)
 [1] "Attributes: < names for target but not for current >"             
 [2] "Attributes: < Length mismatch: comparison on first 0 components >"
 [3] "Component \"a1\": Attributes: < target is NULL, current is list >"
 [4] "Component \"b2\": Attributes: < target is NULL, current is list >"

# OK
all.equal(arr_metadata, grp_metadata, check.attributes = FALSE)
[1] TRUE

# Object structure
str(arr_metadata)
 List of 2
  $ a1: num 1
  $ b2: num 2
  - attr(*, "class")= chr "tiledb_metadata"
  
str(grp_metadata)
 List of 2
  $ a1: num 1
   ..- attr(*, "key")= chr "a1"
  $ b2: num 2
   ..- attr(*, "key")= chr "b2"


# Print to console
arr_metadata
a1:	1
b2:	2

grp_metadata
$a1
[1] 1
attr(,"key")
[1] "a1"

$b2
[1] 2
attr(,"key")
[1] "b2"

Comments/Notes/Fin

In practice, I do strip off the "key" attribute to get identical output structure which also helps in unit testing or mixing array and group metadata for whatever reason.

Other notes and observations:

  • The equivalent function of tiledb_group_get_metadata_from_index() for array is not implemented in R but exists in C++ (tiledb:::libtiledb_array_get_metadata_from_index())

  • tiledb_group_get_all_metadata() is written in R whereas tiledb_get_all_metadata() in C++ (loop under the hood), see libtiledb_array_get_metadata_list; not an issue other than memory efficiency but the implementation will be identical if you write it in C++ e.g., libtiledb_group_get_metadata_list.

  • Metadata related functions perhaps should get a roxygen tag @family metadata that will make it easier to navigate the vast documentation via See also auto generated links.

  • Not vacuum/consolidation operations for group metadata

I hope the above were helpful towards a consistent metadata interface (structure, class, print method, functionality) :) .

Thanks

cgiachalis avatar Oct 22 '24 11:10 cgiachalis

As a last note, it seems at C++ level the group getter is assigned 'key' attribute whereas 'names' for array although the code logic is identical.

libtiledb_array_get_metadata_from_index https://github.com/TileDB-Inc/TileDB-R/blob/c2ba622f7ca0e5bb448f127e0e113bcac277a486/src/libtiledb.cpp#L2878-L2879

libtiledb_group_get_metadata_from_index https://github.com/TileDB-Inc/TileDB-R/blob/c2ba622f7ca0e5bb448f127e0e113bcac277a486/src/libtiledb.cpp#L5434-L5435

cgiachalis avatar Oct 22 '24 15:10 cgiachalis