hyppo icon indicating copy to clipboard operation
hyppo copied to clipboard

Index error in MGCX

Open Rayerdyne opened this issue 1 year ago • 1 comments

My issue is about an IndexError that appears using MGCX.test(). This error is originally thrown by scipy multiscale_graphcorr (cfr stacktrace).

I'm very surprised this depends on the random number generation, i.e. it fails for some seeds but not all of them. Increasing the number of replications (reps) seems to increase the probability that an error occurs. Setting reps=1000 makes seed 16 fail as well. The former actually makes me think I messed up somewhere, but I can't get where

Reproducing code example:

import sys

import pandas as pd
import numpy as np

from hyppo.time_series import MGCX

def test(seed):
    print(f"Testing seed {seed}")
    reps=100

    df = pd.DataFrame([[1, 1],
                       [2, 1],
                       [3, 1],
                       [4, 4],
                       [5, 5],
                       [6, 6]], columns=["a", "b"])

    i_test = MGCX()
    rstate = np.random.RandomState(seed)

    stat, pval, d = i_test.test(df["a"].values, df["b"].values, random_state=rstate, reps=reps)
    print(f"stat: {stat}, pval: {pval}, d: {d}")

if __name__ == "__main__":
    if len(sys.argv) > 1:
        seed = int(sys.argv[1])
        test(seed)
    
    else:
        test(16)
        test(0)

Error message

Testing seed 16
stat: 0.886004262777708, pval: 0.0297029702970297, d: {'opt_lag': 0, 'opt_scale': [6, 4]}
Testing seed 0
Traceback (most recent call last):
  File "/home/f/TRAVAIL/csod/misc/hyppo/problem.py", line 32, in <module>
AIL/csod/misc/hyppo/problem.py", line 22, in test
    stat, pval, d = i_test.test(df["a"].values, df["b"].values, random_state=rstate, reps=reps)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/mgcx.py", line 194, in test
    stat, pvalue, stat_list = super(MGCX, self).test(
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/base.py", line 130, in test
    Parallel(n_jobs=workers)(
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/joblib/parallel.py", line 1863, in __call__
    return output if self.return_generator else list(output)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
    res = func(*args, **kwargs)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/base.py", line 159, in _perm_stat
    perm_stat = calc_stat(distx, permy)[0]
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/mgcx.py", line 106, in statistic
    stat, opt_lag = compute_stat(
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/time_series/_utils.py", line 93, in compute_stat
    indep_test_stat = indep_test.statistic(x, y)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/hyppo/independence/mgc.py", line 161, in statistic
    mgc = multiscale_graphcorr(distx, disty, compute_distance=None, reps=0)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 6490, in multiscale_graphcorr
    stat, stat_dict = _mgc_stat(x, y)
  File "/home/f/TRAVAIL/csod/misc/hyppo/.env/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 6541, in _mgc_stat
    stat = stat_mgc_map[m - 1][n - 1]
IndexError: index 5 is out of bounds for axis 0 with size 1

Version information

  • OS: Arch Linux 6.6.7-arch1-1 (64-bit)
  • Python Version 3.10
  • Package Version hyppo==0.4.0, sci-py==1.11.4, joblib==1.3.2

Rayerdyne avatar Dec 22 '23 14:12 Rayerdyne

Sorry for the late response, this just got on my radar right now. I'll take a look into this

sampan501 avatar Feb 27 '24 15:02 sampan501