hyppo
hyppo copied to clipboard
Index error in MGCX
My issue is about an IndexError
that appears using MGCX.test()
. This error is originally thrown by scipy multiscale_graphcorr
(cfr stacktrace).
I'm very surprised this depends on the random number generation, i.e. it fails for some seeds but not all of them. Increasing the number of replications (reps
) seems to increase the probability that an error occurs. Setting reps=1000
makes seed 16 fail as well.
The former actually makes me think I messed up somewhere, but I can't get where
Reproducing code example:
import sys
import pandas as pd
import numpy as np
from hyppo.time_series import MGCX
def test(seed):
print(f"Testing seed {seed}")
reps=100
df = pd.DataFrame([[1, 1],
[2, 1],
[3, 1],
[4, 4],
[5, 5],
[6, 6]], columns=["a", "b"])
i_test = MGCX()
rstate = np.random.RandomState(seed)
stat, pval, d = i_test.test(df["a"].values, df["b"].values, random_state=rstate, reps=reps)
print(f"stat: {stat}, pval: {pval}, d: {d}")
if __name__ == "__main__":
if len(sys.argv) > 1:
seed = int(sys.argv[1])
test(seed)
else:
test(16)
test(0)
Error message
Testing seed 16
stat: 0.886004262777708, pval: 0.0297029702970297, d: {'opt_lag': 0, 'opt_scale': [6, 4]}
Testing seed 0
Traceback (most recent call last):
File "/home/f/TRAVAIL/csod/misc/hyppo/problem.py", line 32, in <module>
AIL/csod/misc/hyppo/problem.py", line 22, in test
stat, pval, d = i_test.test(df["a"].values, df["b"].values, random_state=rstate, reps=reps)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/mgcx.py", line 194, in test
stat, pvalue, stat_list = super(MGCX, self).test(
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/base.py", line 130, in test
Parallel(n_jobs=workers)(
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/joblib/parallel.py", line 1863, in __call__
return output if self.return_generator else list(output)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
res = func(*args, **kwargs)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/base.py", line 159, in _perm_stat
perm_stat = calc_stat(distx, permy)[0]
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/mgcx.py", line 106, in statistic
stat, opt_lag = compute_stat(
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/_utils.py", line 93, in compute_stat
indep_test_stat = indep_test.statistic(x, y)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/independence/mgc.py", line 161, in statistic
mgc = multiscale_graphcorr(distx, disty, compute_distance=None, reps=0)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 6490, in multiscale_graphcorr
stat, stat_dict = _mgc_stat(x, y)
File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 6541, in _mgc_stat
stat = stat_mgc_map[m - 1][n - 1]
IndexError: index 5 is out of bounds for axis 0 with size 1
Version information
- OS: Arch Linux 6.6.7-arch1-1 (64-bit)
- Python Version 3.10
- Package Version
hyppo==0.4.0
,sci-py==1.11.4
,joblib==1.3.2
Sorry for the late response, this just got on my radar right now. I'll take a look into this