patsy
patsy copied to clipboard
Lagged variables in formula
Could it be possible that patsy support for lagged variables and weights be added? For an econometrician like me it would be of great help not to depend on external developments for working formulas like:
varalpha varalpha(-1) (((varbac*2.1)+(varbac(-1)*1.4)+(varbac(-2)*0.2)+(varbac(-3)0.3))/4) (((varac0.1)+(varac(-1)*3.5)+(varac(-2)*0.1)+(varac(-3)0.3))/4) (((varep2.4)+(varep(-1)*0.1)+(varep(-2)0.5))/3) (((varrl2.2)+(varrl(-2)*0.7)+(varrl(-3)0.1))/3) (((varrs0.8)+(varrs(-1)1.2))/2) (((vartc0.9)+(vartc(-1)*1.8)+(vartc(-2)*0.1)+(vartc(-3)1.2))/4) varaah (((vargs1.1)+(vargs(-1)*1.9)+(vargs(-2)*0.7)+(vargs(-3)0.3))/4) (((vargc0.1)+(vargc(-1)*3.6)+(vargc(-2)*0.1)+(vargc(-3)*0.2))/4)
that I used to work on EViews?
Python with Statsmodel, Pandas and Patsy is a very powerful working environment, but without this developing it is not of big use to econometricians.
Thanks
It should be simple to write a function that shifts data called lag or whatever you want, and use it with syntax like
varalpha ~ 1 + lag(varalpha,1)
since patsy will default drop rows with missing observations. The function lag would look something like
import pandas as pd
def lag(x, n):
if n == 0:
return x
if isinstance(x,pd.Series):
return x.shift(n)
x = x.copy()
x[n:] = x[0:-n]
x[:n] = np.nan
return x
and the use
y = np.random.randn(100)
df = pd.DataFrame({'y':y})
lhs, rhs = dmatrices('y ~ 1 + lag(y,1)',df,return_type='dataframe')
print(np.all(lhs.iloc[:-1,0].values == rhs.iloc[1:,1].values))
True