fixest icon indicating copy to clipboard operation
fixest copied to clipboard

Saving fixest objects

Open tlcaputi opened this issue 3 years ago • 12 comments

Is there a way to save a fixest object as an .RDS file so that I can use it later with all the possible fitstat options?

Say I want to save this fixest object:

library(fixest)
library(haven)

df = read_dta("http://dss.princeton.edu/training/Panel101.dta")
outcome_vars = c("y", "y_bin")
treatment_vars = c("x1")
controls = c("country", "year")

mod = feols(
    .[outcome_vars]
    ~ 
    .[treatment_vars] 
    | 
    .[controls]
    , 
    data = df,
    cluster = ~country,
    panel.id = ~country + year,
    fixef.rm = "none"
    )

etable(list(mod$y, mod$y_bin), fitstat = ~ . + my)
# This reports the regression results as expected
saveRDS(mod, "test.RDS")

When I try to access it later, I can't use most fitstat options, e.g., my:

rm(list = ls()) # or new session
newmod = readRDS("test.RDS")
fixest::etable(list(newmod$y, newmod$y_bin), fitstat = ~ . + my)

# This gives the following error:
# Error in model.matrix.fixest(x, type = "lhs") : 
#  The argument 'data' must be a data.frame or a matrix.

Side Note: It seems to work as expected if I load the same dataset again in the new session, like this:

rm(list = ls())  # or new session
df = haven::read_dta("http://dss.princeton.edu/training/Panel101.dta")
newmod = readRDS("test.RDS")
fixest::etable(list(newmod$y, newmod$y_bin), fitstat = ~ . + my)

Thanks for your help! I'm a huge fan of the package.

tlcaputi avatar Sep 26 '22 04:09 tlcaputi

Hi, I am not 100% sure what fixest is doing in detail, but I am fairly certain that you have basically answered your question: fixest does not store all input objects in the model object - if all objects of type fixestwould have to carry their input data sets along, one might pretty quickly run out of memory. What happens instead is that fixest stores information on the model call and its environment in object$call and object$call_env, and then fetches the input data from the respective environment whenever it is needed. This implies that loading the data before reading the .rds file should be safe, provided it is in exactly the same shape as when estimating the fixest model.

s3alfisc avatar Sep 26 '22 20:09 s3alfisc

Yes, I figured it was something like that.

Is there a way to have the fixest object include those auxiliary objects so it can be saved and reused?

Thanks so much!

tlcaputi avatar Sep 26 '22 20:09 tlcaputi

I don't think there is inbuilt functionality, but I might be mistaken. One simple workaround would be to assign the fixest object and the associated data to a list, and to save that list as an .rds file?

s3alfisc avatar Sep 27 '22 11:09 s3alfisc

Hi everyone : this is currently not possible. Similar to #340. You have hacks to do it but they are not straightforward (one such hack is described in #340).

There is work under way to solve this problem. It will be there for sure, but not before Jan/Feb, sorry!

lrberge avatar Sep 27 '22 11:09 lrberge

hi! I got the same issue, is there a way I can take this issue and solve it in a PR before next monday?

pachadotdev avatar Jan 11 '23 06:01 pachadotdev

Hi @lrberge I was testing some options, one could be to pass the training data in the exported object, like this https://github.com/pachadotdev/eflm/blob/main/R/eglm.R#L202, and then compute fit statistics by calling fit$data as default. What do you think?

pachadotdev avatar Jan 12 '23 05:01 pachadotdev

I started to mimic some glm() behaviour, but with the difference that the user needs to specify the option to put the training dataset in the returned object

https://github.com/pachadotdev/fixest2/commit/973e5eca3ae544ccef9032af5b02d04b3e3880b8

this is not yet ready, when it works well, I'll put the changes in a new branch and send a PR

pachadotdev avatar Jan 13 '23 20:01 pachadotdev

Hi @pachadotdev, please don't :-) I've started to work this out a while ago and I made some major overhauls linked to this issue. There's no need for the PR. I kind of have a "research" semester starting in February so I'll finish this business at that time.

lrberge avatar Jan 13 '23 20:01 lrberge

Hi @pachadotdev, please don't :-) I've started to work this out a while ago and I made some major overhauls linked to this issue. There's no need for the PR. I kind of have a "research" semester starting in February so I'll finish this business at that time.

sure, I'll email you

pachadotdev avatar Jan 13 '23 20:01 pachadotdev

Hi, with a huge delay, note that there's the data.save argument which, if TRUE, will lead to consistent results as in the initial post. Note though that it creates a copy of the full original data set, so it's handy only for small data sets.

lrberge avatar Feb 07 '24 17:02 lrberge

Hi, with a huge delay, note that there's the data.save argument which, if TRUE, will lead to consistent results as in the initial post. Note though that it creates a copy of the full original data set, so it's handy only for small data sets.

dear @lrberge

sorry the delay, i had a big surgery and i'm typing with 1 hand

glad to see that some parts of my old pr are somehow reflected here, this is amazing

pachadotdev avatar Feb 07 '24 22:02 pachadotdev

On the OP, there's a massive overhaul on how fitstats work enabling an effective save of an estimation at the smallest size. But it's still WIP.

lrberge avatar Feb 07 '24 23:02 lrberge