Pruning uses the maximum number of lags not the selected number of lags
Hello,
Thank you very much for this work.
I believe there is an issue with the pruning in the VARLiNGAM code. Two models with the same lags and trained on the same data should give identical results, regardless of the initial maximum number of lags. While this is true if prune=False, adjacency matrices can differ if prune=True. See:
from lingam import VARLiNGAM
X = pd.read_csv('examples/data/sample_data_var_lingam.csv')
model1 = VARLiNGAM(lags = 10)
model1.fit(X)
print(f"model1 has {model1._lags} lags") # 1 lag
model2 = VARLiNGAM(lags = 1)
model2.fit(X)
print(f"model2 has {model2._lags} lags") # 1 lag
print(model1.adjacency_matrices_ - model2.adjacency_matrices_)
Running the same code with prune=False will give identical adjacency matrices.
This issue arises because the number of lags used in the lasso procedure is the initial maximum number lags, not the one selected with the criteria.
Moving this line https://github.com/cdt15/lingam/blob/1495ba515024a27d0ea0cabbc2e15d4aee76823a/lingam/var_lingam.py#L109
before this one https://github.com/cdt15/lingam/blob/1495ba515024a27d0ea0cabbc2e15d4aee76823a/lingam/var_lingam.py#L106
should solve the problem.
Apologies if I’ve misunderstood something.
Many thanks, Paul
Hi, @paullabonne .
Thanks for reporting and analyzing this problem. As you pointed out, we have confirmed that the reference to the number of lags is incorrect when pruning edges. I'll fix it in the next few days (or you can send us a pull request).