mice icon indicating copy to clipboard operation
mice copied to clipboard

Improve `cbind` and `rbind` dispatch

Open stefvanbuuren opened this issue 6 years ago • 2 comments

mice 3.1.3 overwrites the base::cbind() and base::rbind() functions.

This is not elegant, and it throws a warning when the package is loaded. I am looking for a better way to implement dispatch. See #114

For more background see https://stackoverflow.com/questions/47967264/dispatch-of-rbind-and-cbind-for-a-data-frame.

Martin Maechler suggested a solution. Does anyone have time to dive in?

stefvanbuuren avatar Jun 29 '18 07:06 stefvanbuuren

I don't have any great suggestions but I figured some thoughts might help in any case!

There are two possible return types from a call to cbind(a_mids_object, a_data.frame_object) that make some sense;

  1. a data.frame object is returned, or;
  2. a mids object is returned.

What if the output is a data.frame?

From the implementation side one would just need to implement an as.data.frame.mids() function. How a mids object is coerced into a data.frame is not without some ambiguity, but I think the long format of the complete data set makes some sense. If you wanted to add on a new column of data, it will be repeated as many times as required by default (I believe), so you would get something like:

  .imp .id mids.col.1 new.col.1
1    1   1          1         3
2    1   2          2         4

Unfortunately, there isn't such an obvious option for outputting a data.frame from a similar call to rbind because:

  1. Unlike cbind(), base rbind() does not even call as.data.frame() when one (or more) of the arguments is a data.frame.
  2. A user of rbind() probably intends to add new rows to each imputed data set, if the output is to be a data.frame containing the long format of the completed data set, then the new rows need to be duplicated, and given .imp and .id values.

What if the output is a mids object?

Now on the other hand, if the output is a mids object, then I think the options are

  • mask the base functions cbind() and rbind() and live with the associated warning;
  • encourage the use of a call like cbind(a_mids_object, as.mids(a_data.frame_object)), or;
  • abandon cbind() and rbind() and use a different name to describe what is happening, e.g. after_mice_cbind(a_mids_object, ...) or something like add_passive_columns(a_mids_object,...) to reflect that the columns were exempt from the imputation models+procedure.

There seems to be more going on in a call to cbind() and rbind() with a mids object and a data.frame than simply adding the columns and rows from the data.frame. For this reason, it might be good for the user to be clear (in their code) about what is done to the data.frame in order to attach it to the mids object, hinting that the last two options might be preferable to the first.

stephematician avatar Jul 02 '18 00:07 stephematician

Thanks. I would like to be able to run cbind(a_mids_object, a_data.frame_object) and return a mids object, with the contents of the a_data.frame_object incorporated as new columns. Any imputations needed in the new data should be set to NA.

stefvanbuuren avatar Jul 03 '18 18:07 stefvanbuuren

I am no longer looking for a solution so I am closing.

Overwriting base functions - as mice does - is not elegant, but it works and is nowadays accepted in package development.

stefvanbuuren avatar Nov 14 '22 15:11 stefvanbuuren