pymer4 icon indicating copy to clipboard operation
pymer4 copied to clipboard

Strange issue related to variable names and lmer

Open paulcbogdan opened this issue 3 years ago • 1 comments

I am experiencing an issue:

Traceback (most recent call last):
  File "E:/Users/USER/PycharmProjects/PROJECT/sentiment_anal.py", line 297, in <module>
    do_lmer(df_all)
  File "E:/Users/USER/PycharmProjects/PROJECT/sentiment_anal.py", line 196, in do_lmer
    summary = mod.fit(REML=True)
  File "C:\Users\USER\Anaconda3\lib\site-packages\pymer4\models\Lmer.py", line 539, in fit
    out_summary, out_rownames = estimates_func(self.model_obj)
  File "C:\Users\USER\Anaconda3\lib\site-packages\rpy2\robjects\functions.py", line 198, in __call__
    return (super(SignatureTranslatedFunction, self)
  File "C:\Users\USER\Anaconda3\lib\site-packages\rpy2\robjects\functions.py", line 125, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
  File "C:\Users\paulc\Anaconda3\lib\site-packages\rpy2\rinterface_lib\conversion.py", line 45, in _
    cdata = function(*args, **kwargs)
  File "C:\Users\paulc\Anaconda3\lib\site-packages\rpy2\rinterface.py", line 680, in __call__
    raise embedded.RRuntimeError(_rinterface._geterrmessage())
**rpy2.rinterface_lib.embedded.RRuntimeError: Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 2, 0**

This comes about from this patch of code:

    formula = f'M_comp ~ 1 + entropy_sign + (1 + entropy_sign | subr)'
    mod = Lmer(formula, data=df_vars)
    summary = mod.fit(REML=True)

Strangely, the issue seems to come from the name of the dataframe's columns. I provide two pictures below that show what I mean. Changing one of the columns from "entropy_sign" to "PE" makes the error disappear. I think I also vaguely pinpointed the error by looking at this patch of code in lmer.py:

                rstring = (
                    """
                    function(model){
                    out.coef <- data.frame(unclass(summary(model))$coefficients)
                    out.ci <- data.frame(confint(model,method='"""
                    + conf_int
                    + """',nsim="""
                    + str(n_boot)
                    + """))
                    n <- c(rownames(out.ci))
                
                
                    idx <- max(grep('sig',n))
                    print('idx = ')
                    print(idx)
                    # There is some error here where when I analyze "entropy_sign" i need to add idx = 4 (it is 6 otherwise)
                    out.ci <- out.ci[-seq(1:idx),]
                    out <- cbind(out.coef,out.ci)
                    list(out,rownames(out))
                    }
                """
                )

Here are the examples:

Does not work: image

Works: image

This issue isn't really hampering me anymore since I can just change the variable names, although it may be of interest (and if this is a genuine bug, I don't know enough R or rpy2 to actually fix this myself and just contribute a fix)

paulcbogdan avatar Jan 21 '22 13:01 paulcbogdan

Hmm bizarre. I haven't run into this before and I don't seem to be able to reproduce it just by changing the names of the variables in the included sample data (i.e. adding an _). It seems to be happening on the R side of things too. @paulcbogdan can you share platform details, i.e. what OS, Python version, R version, pymer version, etc?

If you installed via conda then conda list should print package versions out.

Also have you tried running the model directly in R? Or using the confint function from R directly? I'm trying to figure out if the issue is with the conversion using rpy2 or within R itself.

ejolly avatar Jan 31 '22 19:01 ejolly

Closing due to time lapse. Feel free to reopen if encountered again.

ejolly avatar Sep 24 '22 04:09 ejolly