rprotobuf icon indicating copy to clipboard operation
rprotobuf copied to clipboard

Accessing nested Messages with `[[` does not behave as expected

Open Selbosh opened this issue 5 years ago • 9 comments

In R, if I have a nested list, then I can access elements within it using repeated $ signs, or repeated square brackets, like so:

nested_list <- list(top = list(middle = list(tail = 42)))
nested_list$top$middle$tail
## [1] 42
nested_list[['top']]['middle']]['tail']]
## [1] 42

You can also access a nested list by passing a vector to the [[ function, like so:

nested_list[[c('top', 'middle', 'tail')]]
## 42

But while the first two methods work with nested RProtoBuf Messages, the last way doesn't. That is, if I pass a character vector to a Message in double square brackets, then it only returns values from the top level.

The tutorial message types don't demonstrate this problem because all of their nested Message fields are also repeated. Here is a minimal example:

syntax = proto3;
message NestedList {
  .TopList top = 1;
}
message TopList {
  .MiddleList middle = 1;
}
message MiddleList {
  int64 tail = 1;
}

The first two methods work:

readProtoFiles('nested.proto') # containing the definition above
nested_pb <- NestedList$new(top = TopList$new(middle = MiddleList$new(tail = 42)))
nested_pb$top$middle$tail
## [1] 42
nested_pb[['top']][['middle']][['tail']]
## [1] 42

But the last one doesn't.

nested_pb[[c('top', 'middle', 'tail')]]
## message of type 'TopList' with 1 field set

This seems inconsistent.

Selbosh avatar Dec 04 '20 19:12 Selbosh

Interesting. Twenty plus years of R and I learn something new every day.

I had not previously seen the form nested_list[[c('top', 'middle', 'tail')]]. It is ... a little different. Passing a vector in lieu of the more common scalar seems to imply a recursive resolution there. That "feature" may well be missing over on our side.

But does that really matter? Not so sure.

eddelbuettel avatar Dec 04 '20 19:12 eddelbuettel

It's something I came across today when hoisting deeply nested data into a flat data frame structure.

If there was an easier way of getting arbitrary nested protobufs into nested R lists, then this would be less of an issue. But as.list.Message isn't recursive, so it's a bit of an involved process.

Selbosh avatar Dec 04 '20 20:12 Selbosh

Do you want to take a stab at fixing it? I would think there is only one place where we define [[ as a function, and you could study what R itself does.

And while I hear your pain loud and clear, I also feel I should point out that for a 10+ year old project it does not appear to have affected most uses and users so far.

eddelbuettel avatar Dec 04 '20 20:12 eddelbuettel

Found the relevant part of R source. https://github.com/wch/r-source/blob/bde26f9074dfd72aef79458dcdc030adc82ce0a0/src/main/subscript.c#L305

And what appears to be the definition of [[ in RProtoBuf. https://github.com/eddelbuettel/rprotobuf/blob/a2ce5e82447bb47ab9cf5bd7f298ba7f58e224ce/R/00classes.R#L391

Leaving these here as future reference for when I (or someone else) get round to trying to implement this.

Or may simply add a warning of the form:

if ( length(i) > 1 ) {
  warning( "`i` has length > 1 and only the first element will be used" )
}

Selbosh avatar Dec 04 '20 22:12 Selbosh

Here is a quick fix. Recursive, as you suggested. Works as a patch in the global environment. Haven't tried rebuilding the package with it in, so use with care.

setMethod( "[[", "Message", function(x, i, j, ..., exact = TRUE){
	if( missing( i ) ){
		stop( "`i` is required" )
	}
	if( !missing(j) ){
		warning( "`j` is ignored" )
	}

        ## This works correctly by number or name. e.g. p[[1]] or p[["name"]]
	if( is.character( i ) || is.numeric( i ) ){

		if( length(i) > 1 ){
       			message( "You just tried to index a Message with a vector. Good luck!" )
       			return( x[[i[1]]][[i[-1]]] )
		}

		.Call( "getMessageField", x@pointer, i, PACKAGE = "RProtoBuf" )
	} else {
		stop( "wrong type, `i` should be a character or a number" )
	}

} )

And the output:

nested_pb[[c('top', 'middle', 'tail')]]
## You just tried to index a Message with a vector. Good luck!
## You just tried to index a Message with a vector. Good luck!
## [1] 42

Selbosh avatar Dec 04 '20 23:12 Selbosh

does not appear to have affected most uses and users so far.

If you build it, they will come :)

I have tried the recursive approach suggested here many times before finally getting annoyed enough today to seek out a feature request today (just as a caution against concluding "nobody uses this because there are no bug reports").

If there was an easier way of getting arbitrary nested protobufs into nested R lists, then this would be less of an issue

💯 yes, exactly, [[ <vector> ]] seems the most natural way to do a generic recursive query in the current API.

E.g. for writing a function like ProtoToDataFrame():

ProtoToDataFrame <- function(x, fields)

where fields can contain arbitrarily-nested entries, I can't think of any non-awkward approaches. With this fix we could do

fields <- list("key1", c("key2", "nested_key"), "key3")
ProtoToDataFrame <- function(x, fields) {
  out <- data.frame()
  for (field in fields) out[paste(field, collapse = ".")] <- sapply(x, \(xi) xi[[field]])
  out
}

(we also commonly see fields = c("key1", "key2.nested_key", "key3"), evocative of the Python API, in which case we'd use strsplit() on the input first)

Any other suggested approach in the current API would welcome.

you could study what R itself does

That's done here by vectorIndex():

https://github.com/r-devel/r-svn/blob/cf7c5accded5c6f3d5cff344354b8907cd16af4a/src/main/subscript.c#L340-L398

MichaelChirico avatar May 09 '23 01:05 MichaelChirico

My kludgy workaround that I have been using is simply jsonlite::fromJSON(toJSON(x), flatten = TRUE) and then spend some time cleaning up the variable names in the result. I don't know whether or not this can help inform a 'proper' solution though.

Selbosh avatar May 09 '23 07:05 Selbosh

I am open to suggestions and PRs by active users of the package :smile:

eddelbuettel avatar May 09 '23 11:05 eddelbuettel

My kludgy workaround that I have been using is simply jsonlite::fromJSON(toJSON(x), flatten = TRUE)

I thought of that but realized our installation somehow doesn't support toJSON, I should invest in fixing that 😅

I am open to suggestions and PRs by active users of the package

The suggested solution above looks reasonable enough to me, if there's no initial issues with it I'll try a PR

MichaelChirico avatar May 09 '23 13:05 MichaelChirico