ggpmisc Support Error in Variables (EIV) models and equations

ggpmisc allows for x an y orientation for equations. However, there are some datasets where EIV is appropriate. Is there a sensible way to make it easy to use EIV (e.g. https://search.r-project.org/CRAN/refmans/eivtools/html/eivreg.html) with the same line plot and equation/r2 etc text functionality that we have come to know and love in ggpmisc?

Mar 19 '25 02:03 markbneal

Hi @markbneal , This is the first time I come across package 'eivreg'. I checked 'broom' and 'broom.helpers' and I could not find support for objects of class eivlm. On the other hand these objects look similar to class lm with a few additional slots.

This is only a guess, and making it work could depend on the availability of query methods for the model fit object. It is past midnight here, I will check tomorrow, but it is possible that stat_poly_eq() and stat_poly_line() can cope. Passing method = "eivreg" and any additional arguments as a named list through parameter method.args might work. I do not remember how strict is the check I use for the value of the argument passed to method, if a character string triggers an error, you can also try the bare function name method = eivreg. If you need to pass an argument to parameter method of eivreg() one approach is to pass, for example, method = "eivreg:qr". Alternatively, it may be possible to pass it through method.args. I suspect that if it does not work, it should not require any big changes to the stats' code.

Thanks for another very interesting question! (Nowadays I use quantile regression very frequently!! Thanks to the issue you raised some years ago.)

Mar 19 '25 23:03 aphalo

If edits are needed, this could be addressed together with #39.

Mar 20 '25 01:03 aphalo

@markbneal stat_poly_line() and stat_poly_eq() do not work with eivreg(). Reasons are: the object returned by summary() on these objects is quite different than for lm(). There is no predict() method defined for the eivlm class. Obtaining a prediction of the line should not be a big problem, but I have no idea how to compute a confidence band for these models. In addition, I have a test for the class of the model fit object requiring it to be derived from class lm and this is not the case for eivlm objects.

I noticed that there are other packages that also implement EIV. I haven't checked them, but if any of these has a predict() method for the objects they create or are supported by 'broom', it would make writing a new pair of stats easier. Have you checked any of them?

Mar 20 '25 10:03 aphalo

@markbneal Hi Mark! Would plotting a fitted or predicted line without a confidence band and the adjusted estimates of parameter values as an equation be enough for your needs? This seems doable with not too much effort.

Mar 26 '25 15:03 aphalo

Yes, thanks, the lines without confidence bands would be an excellent start. If the relevant statistics can be printed, that gives some indication of how good the fit etc will be.

I'm not familiar enough with EIV to give advice on alternative packages and their benefits or otherwise, and whether some might make the confidence interval graphing process simpler.

Mar 26 '25 18:03 markbneal

Hi. Do you have some data and an example fit code that I could use? The user would have to provide the Sigma_error matrix. Such an example would help. I mean with a single continuous explanatory variable, rather than with multiple explanatory variables as in the eivreg() help page examples.

I do plan to address issue #30 allowing by-panel in addtion to by-group fits at some point as I think fitting linear models where the groupings can be included as factors in an ANCOVA or similar model with lm() or eivreg() would be very useful. In the meantime, I would like to start by implementing EIV for the case of the simpler model formula y ~ x. My intention is to implement by panel-fits in the existing statistics.

Mar 27 '25 12:03 aphalo

Hi Pedro, Sorry for delay. I have one of my colleagues pulling together a nice data set for testing with this. will get back to you in a few days.

Apr 03 '25 21:04 markbneal

eivtools has been removed from CRAN, i'm not sure if i should take this as a sign that it might not be maintained in the future.

Google had this to say about options on CRAN.

Several CRAN packages provide tools for analyzing data with errors in variables: eive: An algorithm for reducing errors-in-variables bias in linear regression. hdme: High-Dimensional Regression with Measurement Error, including penalized regression for generalized linear models. eivtools: Measurement Error Modeling Tools, including functions for analysis with error-prone covariates. (now off cran) meerva: A package for augmented estimates in regression models with measurement error. sleev: Semiparametric Likelihood Estimation with Errors in Variables. tls: Tools of Total Least Squares in Error-in-Variables Models.

Some investigation:

eivtools had jags as a dependency for fitting, and as you mentioned, the method requires a value for reliability of x (or vector for multiple regression) or sigma value (matrix/array for multiple regression). It seems like a reasonable requirement that the user provide some info about the relative reliability, though i don't know how a user is meant to come up with this number for real data.

eive doesn't require jags, fitting with a genetic algortihm instead. It doesn't require a sigma value, but its not clear to me if this a helpful thing if it hides / makes some implicit assumption. The output looks pretty consistent with lm.

hdme is a bit hard core, i didn't see an example i could replicate emulate for a simple problem (e.g. one x, one y variable).

meerva seems more about the adjustment of the model if you have some non-noisy data.

sleev seems to be dealing with a similar problem type to meerva.

tls looks like a pretty lightweight package, but requires there to be no intercept, so not very generalisable.

On github I found EIVmodels (https://github.com/ncahill89/EIVmodels). Jags as dependency, as does eivtools. doesn't seem to require sigma value. Output less consistent with LM.

refitME (on cran) might also be an option. https://github.com/JakubStats/refitME/blob/master/inst/refitME_tutorial.pdf

Apr 04 '25 02:04 markbneal

@markbneal 'eivtools' was archived only a few days ago. Surprisingly, the problem with the automatic CRAN check is only a NOTE about the documentation. It should be easy to fix. The source is in GitHub but the author has not been active in three years, and then only changed the e-mail address. Last significant commit seems to be from five years ago. 'eivtools' was downloaded 119 times during the last month. 'refitME' looks interesting as the user interface seems good and it can be applied to different models. It is only slightly more popular than 'eivtools' based on the dowloads, 144 in the last month. It is also triggering a NOTE in CRAN. Once again, something very easy to fix. 'eive' is a little more popular at 234 downloads but handles only linear regression and I cannot understand how it estimates the measuring error variance (maybe I should read the paper they cite). Taking into account measurement errors seems to me something that should be useful more widely. For example 'lmodel2' for major axis regression has more than 13000 downloads per month...

Looking at 'refitME' it takes a model object, adjusts the estimates, and returns a similar model with the adjusted values. This new object is of a class derived from the class of the model received as input. So, for a lm fit it returns an lm object. I will think how to implement support for this approach... at the moment it seems not too difficult. Possibly can be incorporated to stat_poly_line() and stat_poly_eq().

Thanks for looking into all these alternatives!

I will let you know when I have something for you to test or more questions...

Apr 08 '25 21:04 aphalo

There is an article published in PlosONE in 2023 about 'refitME'.

There is a package called 'mecor' with an article published in Computer Methods and Programs in Biomedicine in 2021. This package only supports linear regression models, but with many different corrections approaches.

Anyway, 'refitME' still seems to be the easiest one for use in 'ggpmisc'.

Apr 09 '25 13:04 aphalo

My attempt to get refitME() working has failed because of what at first sight seems to be a bug in 'refitME' affecting how data are accessed when extracting them from the model fit object. MCEMfit_glm() fails when called from within another function.

I defined a function to fit the model and refit it.

lm_EIV <- function(formula, data, ..., sigma.sq.u) {
   fm <- lm(formula = formula, data = data, ...)
   MCEMfit_glm(mod = fm, family = "gaussian", sigma.sq.u = sigma.sq.u)
}

This function triggers an error, while the same two statements work if entered individually at the R command prompt. The error message is:

Error in mod$data[, -c(1, which(names(mod$data) %in% names.w))] : 
  object of type 'closure' is not subsettable
Called from: as.data.frame(mod$data[, -c(1, which(names(mod$data) %in% names.w))])

Changing the definition of the wrapper function into

lm_EIV <- function(formula, data, ..., sigma.sq.u) {
   my.data <<- data
   fm <- lm(formula = formula, data = my.data, ...)
   MCEMfit_glm(mod = fm, family = "gaussian", sigma.sq.u = sigma.sq.u)
}

allows it to work when called at the R prompt, but fails when called by another function. Making a copy of the data in the global environment is anyway an uggly/smelly kludge.

So, we need to find a workaround for defining this function in a way that can be called by a statistic or try to get this fixed in package 'refitME', I will prepare a proper reprex and raise an issue.

If this function could be made to work, stat_poly_line() and stat_poly_eq() could be used for EIV models, unless some other unexpected difficulty appears.

Apr 12 '25 21:04 aphalo

@markbneal While trying to get EIV working I ended implementing support for nlme:gls(), i.e, generalized least squares, in stat_poly_eq() and stat_poly_line(). So now there is support for variance covariates through weights and not yet tested, autocorrelated error variation.

Apr 13 '25 20:04 aphalo

So, we need to find a workaround for defining this function in a way that can be called by a statistic or try to get this fixed in package 'refitME', I will prepare a proper reprex and raise an issue.

https://github.com/JakubStats/refitME looks like the place to raise an issue. Apologies, I don't think I could make a neat reprex to demonstrate the problem.

Apr 13 '25 23:04 markbneal

@markbneal While trying to get EIV working I ended implementing support for nlme:gls(), i.e, generalized least squares, in stat_poly_eq() and stat_poly_line(). So now there is support for variance covariates through weights and not yet tested, autocorrelated error variation.

Nice, I'm sure allowing for GLS will come in handy for someone!

Apr 13 '25 23:04 markbneal

Raised issue for 'refitME' https://github.com/JakubStats/refitME/issues/5

Apr 14 '25 09:04 aphalo

Hi @markbneal The author of 'refitME' has provided a temporary fix to get 'refitME' working with 'ggpmisc'. He will also try to find a proper fix and update his package. Meanwhile, the tempory fix can be used to achieve what you suggested in this issue. Let's see if this approach works well before considering alternatines, as this does not require any change to 'ggpmisc'.

Meanwhile, I worked on issue #39 and added support for resistant regression with 'MASS::lqs()'.

Apr 15 '25 19:04 aphalo

Meanwhile, I worked on issue #39 and added support for resistant regression with 'MASS::lqs()'.

Interesting, I am familiar with robust regression, but resistant was new to me, though a pretty logical extension. Any thoughts on how we might pass the method to the text, in the case when on the same graph we have different techniques, eg ols, eiv, quantile, gls etc.

Apr 15 '25 20:04 markbneal

Yes, stat_poly_eq(), stat_ma_eq() and stat_quant_eq() add a method.label to the returned data object. For example, mapping = use_label("eq", "method") does the trick.

library(ggpmisc)

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  stat_poly_line() +
  stat_poly_eq(use_label("eq", "method"),
               label.x = "right")

^{Created on 2025-04-16 with reprex v2.1.1}

Apr 16 '25 16:04 aphalo

Already there! I'll make sure my colleagues know, as they often use quantile or ols, and now will be using eiv for certain jobs, so will be a good habit for them to add method. Cheers, M

Apr 17 '25 10:04 markbneal

@markbneal I forgot to mention that method labels work correctly only in the version now in R-Universe or here at GitHub. In the CRAN version they were not set when the argument to method was not a character string.

The method labels now include "method: ", I cannot make up my mind of whether to keep this text or not, as it seems to sometimes get in the way. What do you think?

Now robustbase::lmrob() also works. This function returns "robustness weights" separately from the "prior weights" passed as arguments in the call. In the case of MASS::rlm() both types of weights are also available. Returning one or the other in a variable called "weights" was a bad approach that managed to even confuse myself. So stat_fit_deviations() now returns both weights and robustness weights. Package 'robustbase' seems to be a "heavy-weight" with some big names behind it and comprehensive in scope. I "discovered" it only yesterday.

Now stat_fit_residuals(), stat_fit_deviations() and stat_fit_fitted() also work when weights are not available by returning NAs instead of stopping with an error message. In cases where it is possible to be sure that the missing weights are all ones, this is the value used instead of NAs.

I added some examples to the on-line documentation in an "article". This article is still very rough and will be chaning.

Apr 17 '25 20:04 aphalo

@markbneal I see that the fitted model object returned by lqs() has a field named bestone with indexes to observations. However, when I run the code with an example, bestone points to only two observations. Do you know how to interpret this? Are only for two observations the weights equal to 1 and 0 for all others? It surprises me...

Apr 18 '25 17:04 aphalo

@markbneal It does not look like bestone is of any use. However, robustbase::ltsReg() seems to be similarly resistant and does return lts.weights that actually make sense. I updated the code to support this, and as there is no predict() method for these objects, stat_poly_line() now calls fitted() in cases like this. I added one example to the on-line documentation in the "article".

Apr 19 '25 09:04 aphalo

@markbneal I haven't made progress with EIV, but I noticed while implementing support for 'smatr' (#59) in stat_poly_line() and stat_poly_eq() that ma() and sma() functions from 'smatr' accept as input a matrix with measurement error estimates on x and/or y, I haven't tried this yet, as I got MA and SMA working only some minutes ago. Support for 'smatr' in 'ggpmisc' is available from R-Universe and will stay there for some time as the previous version of 'ggpmisc' was accepted by CRAN only a week or so ago. There are surely still some rough edges. I added some new examples in an on-line only article on MA regression at https://docs.r4photobiology.info/ggpmisc/articles/major-axis-regression.html

I haven't heard from the author of 'refitME' and there has been no activity in the 'refitME' github repository... So, no progress on EIV.

Jul 12 '25 18:07 aphalo