TSCV icon indicating copy to clipboard operation
TSCV copied to clipboard

time boost in folds generation

Open aldder opened this issue 3 years ago • 3 comments

With contiguous test sets:

cv_orig = GapKFold(n_splits=5, gap_before=1, gap_after=1)

for train_index, test_index in cv_orig.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [3 4 5 6 7 8 9] TEST: [0 1]
... TRAIN: [0 5 6 7 8 9] TEST: [2 3]
... TRAIN: [0 1 2 7 8 9] TEST: [4 5]
... TRAIN: [0 1 2 3 4 9] TEST: [6 7]
... TRAIN: [0 1 2 3 4 5 6] TEST: [8 9]
cv_opt = GapKFold(n_splits=5, gap_before=1, gap_after=1)

for train_index, test_index in cv_opt.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [3 4 5 6 7 8 9] TEST: [0 1]
... TRAIN: [0 5 6 7 8 9] TEST: [2 3]
... TRAIN: [0 1 2 7 8 9] TEST: [4 5]
... TRAIN: [0 1 2 3 4 9] TEST: [6 7]
... TRAIN: [0 1 2 3 4 5 6] TEST: [8 9]
%%timeit
folds = list(cv_orig.split(np.arange(10000)))


... 1.21 s ± 37.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
folds = list(cv_opt.split(np.arange(10000)))


... 4.74 ms ± 44.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

With uncontiguous test sets:

cv_orig = _XXX_(_xxx_, gap_before=1, gap_after=1)

for train_index, test_index in cv_orig.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [5 6 7 8 9] TEST: [0 1 2 3]
... TRAIN: [7 8 9] TEST: [0 1 4 5]
... TRAIN: [3 4 9] TEST: [0 1 6 7]
... TRAIN: [3 4 5 6] TEST: [0 1 8 9]
... TRAIN: [0 7 8 9] TEST: [2 3 4 5]
... TRAIN: [0 9] TEST: [2 3 6 7]
... TRAIN: [0 5 6] TEST: [2 3 8 9]
... TRAIN: [0 1 2 9] TEST: [4 5 6 7]
... TRAIN: [0 1 2] TEST: [4 5 8 9]
... TRAIN: [0 1 2 3 4] TEST: [6 7 8 9]
cv_opt = _XXX_(_xxx_, gap_before=1, gap_after=1)

for train_index, test_index in cv_opt.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [5 6 7 8 9] TEST: [0 1 2 3]
... TRAIN: [7 8 9] TEST: [0 1 4 5]
... TRAIN: [3 4 9] TEST: [0 1 6 7]
... TRAIN: [3 4 5 6] TEST: [0 1 8 9]
... TRAIN: [0 7 8 9] TEST: [2 3 4 5]
... TRAIN: [0 9] TEST: [2 3 6 7]
... TRAIN: [0 5 6] TEST: [2 3 8 9]
... TRAIN: [0 1 2 9] TEST: [4 5 6 7]
... TRAIN: [0 1 2] TEST: [4 5 8 9]
... TRAIN: [0 1 2 3 4] TEST: [6 7 8 9]
%%timeit
folds = list(cv_orig.split(np.arange(10000)))

... 1.23 s ± 75.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
folds = list(cv_opt.split(np.arange(10000)))

... 4.78 ms ± 49.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

aldder avatar Feb 01 '22 17:02 aldder

Hello @aldder! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2022-02-04 12:11:14 UTC

pep8speaks avatar Feb 01 '22 17:02 pep8speaks

Hi @aldder , please time the updated version and report the performance gain.

WenjieZ avatar Feb 05 '22 04:02 WenjieZ

Codecov Report

Merging #42 (f5c38b3) into master (c05265a) will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #42   +/-   ##
=======================================
  Coverage   97.51%   97.51%           
=======================================
  Files           3        3           
  Lines         643      645    +2     
=======================================
+ Hits          627      629    +2     
  Misses         16       16           
Impacted Files Coverage Δ
tscv/_split.py 93.82% <100.00%> (+0.05%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c05265a...f5c38b3. Read the comment docs.

codecov[bot] avatar Feb 07 '22 07:02 codecov[bot]