mipfp icon indicating copy to clipboard operation
mipfp copied to clipboard

Classes of mipfp output objects

Open steffen-stell opened this issue 11 months ago • 0 comments

The classes of the output are somewhat messy. Let me show you with some example data from the survey package.

library(survey)
library(sloop)
library(mipfp)

data(api)

## population marginal totals for each stratum
pop.types <- data.frame(stype=c("E","H","M"), Freq=c(4421,755,1018)) |> 
  xtabs(data = _, Freq ~ stype)
pop.schwide <- data.frame(sch.wide=c("No","Yes"), Freq=c(1072,5122)) |> 
  xtabs(data = _, Freq ~ sch.wide)


survey <- xtabs(pw ~ stype + sch.wide, apiclus1)

est_ipfp <- Estimate(
  seed = survey,
  target.list = list(1, 2),
  target.data = list(pop.types, pop.schwide)
)

est_ml <- Estimate(
  seed = survey,
  target.list = list(1, 2),
  target.data = list(pop.types, pop.schwide),
  method = "ml"
)

First of all, both of these objects have the same class attribute:

class(est_ipfp)

## [1] "list"  "mipfp"

The problem is that "list" comes before "mipfp". This means that in method dispatch, list methods will take precedence before mipfp. That is a problem if we want to implement a specific mipfp method for generics that have a list method, because the latter will be used. E.g., implementing a method for as.data.frame or as_tibble has no effect as demonstrated below. The mipfp methods implemented in this package like print.mipfp or summary.mipfp only work, because there are no list methods to take precedence. The class attributes show be reversed.

as.data.frame.mipfp <- \(x, ...){
    as.data.frame.table(x$x.hat, stringsAsFactors = FALSE)
}

s3_dispatch(as.data.frame(est_ipfp))

## => as.data.frame.list
##  * as.data.frame.mipfp
##  * as.data.frame.default

Moving on to the list elements, we can see that $x.hat and $p.hat have different classes for the ipfp and ml outputs.

lapply(est_ipfp, class)

## $x.hat
## [1] "xtabs" "table"
## 
## $p.hat
## [1] "xtabs" "table"
## 
## $conv
## [1] "logical"
## 
## $error.margins
## [1] "numeric"
## 
## $evol.stp.crit
## [1] "numeric"
## 
## $method
## [1] "character"
## 
## $call
## [1] "call"

lapply(est_ml, class)

## $x.hat
## [1] "matrix" "array" 
## 
## $p.hat
## [1] "matrix" "array" 
## 
## $error.margins
## [1] "numeric"
## 
## $solnp.res
## [1] "list"
## 
## $conv
## [1] "logical"
## 
## $method
## [1] "character"
## 
## $call
## [1] "call"

Again, this leads to different method dispatch and results in different outputs for calls, e.g. for as.data.frame(). We can see below that for the ipfp method we get a dispatch to as.data.frame.table while for the ml method we get as.data.frame.matrix.

s3_dispatch(as.data.frame(est_ipfp$x.hat))

##    as.data.frame.xtabs
## => as.data.frame.table
##  * as.data.frame.default

s3_dispatch(as.data.frame(est_ml$x.hat))

## => as.data.frame.matrix
##    as.data.frame.double
##  * as.data.frame.numeric
##  * as.data.frame.default

This leads to different returned data structures. as.data.frame.table gives a column to every dimension. as.data.frame.array puts the first dimension in the row names and concatenates the further dimensions in the column names, which gets very wide and messy with more dimensions. Clearly, as.data.frame.table provides the output any sane person would want here.

as.data.frame(est_ipfp$x.hat)

##   stype sch.wide      Freq
## 1     E       No  478.0708
## 2     H       No  201.3766
## 3     M       No  392.5526
## 4     E      Yes 3942.9292
## 5     H      Yes  553.6234
## 6     M      Yes  625.4474

as.data.frame(est_ml$x.hat)

##         No       Yes
## E 456.4722 3964.5278
## H 220.8177  534.1823
## M 394.7101  623.2899

Though not included in the class attribute, xtabs and table objects are also matrices and arrays, at least according to is.matrix() and is.array() (as far as I know, any object with a dim attribute is both a matrix and an array). I don’t see any structural differences between table and matrix/array objects. xtabs has the additional call attribute. Probably, $x.hat and $p.hat for the ml method should simply be table class.

Why am I writing all this? Well, converting $x.hat into a data.frame/tibble is something I do regularly while working with this package. And it would be great to have more consistency around these conversions. I might make PR in if I find the time for it. Though it should be said that changing classes - and thus method dispatch - would be a breaking change. I think it would be worth it to have more consistency.

steffen-stell avatar Jan 24 '25 14:01 steffen-stell