patsy icon indicating copy to clipboard operation
patsy copied to clipboard

Lagged variables in formula

Open ralpherns opened this issue 8 years ago • 1 comments

Could it be possible that patsy support for lagged variables and weights be added? For an econometrician like me it would be of great help not to depend on external developments for working formulas like:

varalpha varalpha(-1) (((varbac*2.1)+(varbac(-1)*1.4)+(varbac(-2)*0.2)+(varbac(-3)0.3))/4) (((varac0.1)+(varac(-1)*3.5)+(varac(-2)*0.1)+(varac(-3)0.3))/4) (((varep2.4)+(varep(-1)*0.1)+(varep(-2)0.5))/3) (((varrl2.2)+(varrl(-2)*0.7)+(varrl(-3)0.1))/3) (((varrs0.8)+(varrs(-1)1.2))/2) (((vartc0.9)+(vartc(-1)*1.8)+(vartc(-2)*0.1)+(vartc(-3)1.2))/4) varaah (((vargs1.1)+(vargs(-1)*1.9)+(vargs(-2)*0.7)+(vargs(-3)0.3))/4) (((vargc0.1)+(vargc(-1)*3.6)+(vargc(-2)*0.1)+(vargc(-3)*0.2))/4)

that I used to work on EViews?

Python with Statsmodel, Pandas and Patsy is a very powerful working environment, but without this developing it is not of big use to econometricians.

Thanks

ralpherns avatar May 04 '17 09:05 ralpherns

It should be simple to write a function that shifts data called lag or whatever you want, and use it with syntax like

varalpha ~ 1 + lag(varalpha,1)

since patsy will default drop rows with missing observations. The function lag would look something like

import pandas as pd

def lag(x, n):
    if n == 0:
        return x
    if isinstance(x,pd.Series):
        return x.shift(n)

    x = x.copy()
    x[n:] = x[0:-n]
    x[:n] = np.nan
    return x

and the use

y  = np.random.randn(100)
df = pd.DataFrame({'y':y})
lhs, rhs = dmatrices('y ~ 1 + lag(y,1)',df,return_type='dataframe')

print(np.all(lhs.iloc[:-1,0].values == rhs.iloc[1:,1].values))

True

bashtage avatar Jul 03 '17 10:07 bashtage