Pruning uses the maximum number of lags not the selected number of lags

Open paullabonne opened this issue 8 months ago • 1 comments

Hello,

Thank you very much for this work.

I believe there is an issue with the pruning in the VARLiNGAM code. Two models with the same lags and trained on the same data should give identical results, regardless of the initial maximum number of lags. While this is true if prune=False, adjacency matrices can differ if prune=True. See:

from lingam import VARLiNGAM
X = pd.read_csv('examples/data/sample_data_var_lingam.csv')

model1 = VARLiNGAM(lags = 10)
model1.fit(X)
print(f"model1 has {model1._lags} lags") # 1 lag 

model2 = VARLiNGAM(lags = 1)
model2.fit(X)
print(f"model2 has {model2._lags} lags") # 1 lag 

print(model1.adjacency_matrices_ - model2.adjacency_matrices_)

Running the same code with prune=False will give identical adjacency matrices.

This issue arises because the number of lags used in the lasso procedure is the initial maximum number lags, not the one selected with the criteria.

Moving this line https://github.com/cdt15/lingam/blob/1495ba515024a27d0ea0cabbc2e15d4aee76823a/lingam/var_lingam.py#L109

before this one https://github.com/cdt15/lingam/blob/1495ba515024a27d0ea0cabbc2e15d4aee76823a/lingam/var_lingam.py#L106

should solve the problem.

Apologies if I’ve misunderstood something.

Many thanks, Paul

May 13 '25 22:05 paullabonne

Hi, @paullabonne .

Thanks for reporting and analyzing this problem. As you pointed out, we have confirmed that the reference to the number of lags is incorrect when pruning edges. I'll fix it in the next few days (or you can send us a pull request).

May 15 '25 04:05 ikeuchi-screen