TSCV
TSCV copied to clipboard
time boost in folds generation
With contiguous test sets:
cv_orig = GapKFold(n_splits=5, gap_before=1, gap_after=1)
for train_index, test_index in cv_orig.split(np.arange(10)):
print("TRAIN:", train_index, "TEST:", test_index)
... TRAIN: [3 4 5 6 7 8 9] TEST: [0 1]
... TRAIN: [0 5 6 7 8 9] TEST: [2 3]
... TRAIN: [0 1 2 7 8 9] TEST: [4 5]
... TRAIN: [0 1 2 3 4 9] TEST: [6 7]
... TRAIN: [0 1 2 3 4 5 6] TEST: [8 9]
cv_opt = GapKFold(n_splits=5, gap_before=1, gap_after=1)
for train_index, test_index in cv_opt.split(np.arange(10)):
print("TRAIN:", train_index, "TEST:", test_index)
... TRAIN: [3 4 5 6 7 8 9] TEST: [0 1]
... TRAIN: [0 5 6 7 8 9] TEST: [2 3]
... TRAIN: [0 1 2 7 8 9] TEST: [4 5]
... TRAIN: [0 1 2 3 4 9] TEST: [6 7]
... TRAIN: [0 1 2 3 4 5 6] TEST: [8 9]
%%timeit
folds = list(cv_orig.split(np.arange(10000)))
... 1.21 s ± 37.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
folds = list(cv_opt.split(np.arange(10000)))
... 4.74 ms ± 44.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
With uncontiguous test sets:
cv_orig = _XXX_(_xxx_, gap_before=1, gap_after=1)
for train_index, test_index in cv_orig.split(np.arange(10)):
print("TRAIN:", train_index, "TEST:", test_index)
... TRAIN: [5 6 7 8 9] TEST: [0 1 2 3]
... TRAIN: [7 8 9] TEST: [0 1 4 5]
... TRAIN: [3 4 9] TEST: [0 1 6 7]
... TRAIN: [3 4 5 6] TEST: [0 1 8 9]
... TRAIN: [0 7 8 9] TEST: [2 3 4 5]
... TRAIN: [0 9] TEST: [2 3 6 7]
... TRAIN: [0 5 6] TEST: [2 3 8 9]
... TRAIN: [0 1 2 9] TEST: [4 5 6 7]
... TRAIN: [0 1 2] TEST: [4 5 8 9]
... TRAIN: [0 1 2 3 4] TEST: [6 7 8 9]
cv_opt = _XXX_(_xxx_, gap_before=1, gap_after=1)
for train_index, test_index in cv_opt.split(np.arange(10)):
print("TRAIN:", train_index, "TEST:", test_index)
... TRAIN: [5 6 7 8 9] TEST: [0 1 2 3]
... TRAIN: [7 8 9] TEST: [0 1 4 5]
... TRAIN: [3 4 9] TEST: [0 1 6 7]
... TRAIN: [3 4 5 6] TEST: [0 1 8 9]
... TRAIN: [0 7 8 9] TEST: [2 3 4 5]
... TRAIN: [0 9] TEST: [2 3 6 7]
... TRAIN: [0 5 6] TEST: [2 3 8 9]
... TRAIN: [0 1 2 9] TEST: [4 5 6 7]
... TRAIN: [0 1 2] TEST: [4 5 8 9]
... TRAIN: [0 1 2 3 4] TEST: [6 7 8 9]
%%timeit
folds = list(cv_orig.split(np.arange(10000)))
... 1.23 s ± 75.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
folds = list(cv_opt.split(np.arange(10000)))
... 4.78 ms ± 49.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Hello @aldder! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2022-02-04 12:11:14 UTC
Hi @aldder , please time the updated version and report the performance gain.
Codecov Report
Merging #42 (f5c38b3) into master (c05265a) will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #42 +/- ##
=======================================
Coverage 97.51% 97.51%
=======================================
Files 3 3
Lines 643 645 +2
=======================================
+ Hits 627 629 +2
Misses 16 16
Impacted Files | Coverage Δ | |
---|---|---|
tscv/_split.py | 93.82% <100.00%> (+0.05%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update c05265a...f5c38b3. Read the comment docs.