Accessing nested Messages with `[[` does not behave as expected
In R, if I have a nested list, then I can access elements within it using repeated $ signs, or repeated square brackets, like so:
nested_list <- list(top = list(middle = list(tail = 42)))
nested_list$top$middle$tail
## [1] 42
nested_list[['top']]['middle']]['tail']]
## [1] 42
You can also access a nested list by passing a vector to the [[ function, like so:
nested_list[[c('top', 'middle', 'tail')]]
## 42
But while the first two methods work with nested RProtoBuf Messages, the last way doesn't. That is, if I pass a character vector to a Message in double square brackets, then it only returns values from the top level.
The tutorial message types don't demonstrate this problem because all of their nested Message fields are also repeated. Here is a minimal example:
syntax = proto3;
message NestedList {
.TopList top = 1;
}
message TopList {
.MiddleList middle = 1;
}
message MiddleList {
int64 tail = 1;
}
The first two methods work:
readProtoFiles('nested.proto') # containing the definition above
nested_pb <- NestedList$new(top = TopList$new(middle = MiddleList$new(tail = 42)))
nested_pb$top$middle$tail
## [1] 42
nested_pb[['top']][['middle']][['tail']]
## [1] 42
But the last one doesn't.
nested_pb[[c('top', 'middle', 'tail')]]
## message of type 'TopList' with 1 field set
This seems inconsistent.
Interesting. Twenty plus years of R and I learn something new every day.
I had not previously seen the form nested_list[[c('top', 'middle', 'tail')]]. It is ... a little different. Passing a vector in lieu of the more common scalar seems to imply a recursive resolution there. That "feature" may well be missing over on our side.
But does that really matter? Not so sure.
It's something I came across today when hoisting deeply nested data into a flat data frame structure.
If there was an easier way of getting arbitrary nested protobufs into nested R lists, then this would be less of an issue. But as.list.Message isn't recursive, so it's a bit of an involved process.
Do you want to take a stab at fixing it? I would think there is only one place where we define [[ as a function, and you could study what R itself does.
And while I hear your pain loud and clear, I also feel I should point out that for a 10+ year old project it does not appear to have affected most uses and users so far.
Found the relevant part of R source. https://github.com/wch/r-source/blob/bde26f9074dfd72aef79458dcdc030adc82ce0a0/src/main/subscript.c#L305
And what appears to be the definition of [[ in RProtoBuf.
https://github.com/eddelbuettel/rprotobuf/blob/a2ce5e82447bb47ab9cf5bd7f298ba7f58e224ce/R/00classes.R#L391
Leaving these here as future reference for when I (or someone else) get round to trying to implement this.
Or may simply add a warning of the form:
if ( length(i) > 1 ) {
warning( "`i` has length > 1 and only the first element will be used" )
}
Here is a quick fix. Recursive, as you suggested. Works as a patch in the global environment. Haven't tried rebuilding the package with it in, so use with care.
setMethod( "[[", "Message", function(x, i, j, ..., exact = TRUE){
if( missing( i ) ){
stop( "`i` is required" )
}
if( !missing(j) ){
warning( "`j` is ignored" )
}
## This works correctly by number or name. e.g. p[[1]] or p[["name"]]
if( is.character( i ) || is.numeric( i ) ){
if( length(i) > 1 ){
message( "You just tried to index a Message with a vector. Good luck!" )
return( x[[i[1]]][[i[-1]]] )
}
.Call( "getMessageField", x@pointer, i, PACKAGE = "RProtoBuf" )
} else {
stop( "wrong type, `i` should be a character or a number" )
}
} )
And the output:
nested_pb[[c('top', 'middle', 'tail')]]
## You just tried to index a Message with a vector. Good luck!
## You just tried to index a Message with a vector. Good luck!
## [1] 42
does not appear to have affected most uses and users so far.
If you build it, they will come :)
I have tried the recursive approach suggested here many times before finally getting annoyed enough today to seek out a feature request today (just as a caution against concluding "nobody uses this because there are no bug reports").
If there was an easier way of getting arbitrary nested protobufs into nested R lists, then this would be less of an issue
💯 yes, exactly, [[ <vector> ]] seems the most natural way to do a generic recursive query in the current API.
E.g. for writing a function like ProtoToDataFrame():
ProtoToDataFrame <- function(x, fields)
where fields can contain arbitrarily-nested entries, I can't think of any non-awkward approaches. With this fix we could do
fields <- list("key1", c("key2", "nested_key"), "key3")
ProtoToDataFrame <- function(x, fields) {
out <- data.frame()
for (field in fields) out[paste(field, collapse = ".")] <- sapply(x, \(xi) xi[[field]])
out
}
(we also commonly see fields = c("key1", "key2.nested_key", "key3"), evocative of the Python API, in which case we'd use strsplit() on the input first)
Any other suggested approach in the current API would welcome.
you could study what R itself does
That's done here by vectorIndex():
https://github.com/r-devel/r-svn/blob/cf7c5accded5c6f3d5cff344354b8907cd16af4a/src/main/subscript.c#L340-L398
My kludgy workaround that I have been using is simply jsonlite::fromJSON(toJSON(x), flatten = TRUE) and then spend some time cleaning up the variable names in the result. I don't know whether or not this can help inform a 'proper' solution though.
I am open to suggestions and PRs by active users of the package :smile:
My kludgy workaround that I have been using is simply
jsonlite::fromJSON(toJSON(x), flatten = TRUE)
I thought of that but realized our installation somehow doesn't support toJSON, I should invest in fixing that 😅
I am open to suggestions and PRs by active users of the package
The suggested solution above looks reasonable enough to me, if there's no initial issues with it I'll try a PR