MLUtils.jl
MLUtils.jl copied to clipboard
Status of MLDataPattern porting
A list of what is currently exported from MLDataPattern.jl.
TO PORT
- [x]
getobs
,getobs!
andnobs
. #1-
nobs
is nownumobs
; -
obsdim
argument is dropped from the interface
-
- [x]
randobs
#1 - [x]
datasubset, DataSubset
#4 - [x]
shuffleobs
#5 - [x]
splitobs
#5 - [x]
DataView
#5- [ ] Consider removal
- [x]
obsview, ObsView
#5- [ ] Consider removal #8
- [x]
batchview, BatchView
#6 - [x]
batchsize
#6 - [ ]
slidingwindow, SlidingWindow
- [ ]
stratifiedobs
- [x]
oversample, undersample
#10 - [x]
kfolds
#9 - [x]
leaveout
#9 - [x]
eachobs
#9 - [x]
eachbatch
#9
NOT TO BE PORTED
-
BufferGetObs
-
RandomObs, RandomBatches
-
BalancedObs
-
FoldView
-
targets
-
eachtarget
We can consider this essentially done
Hi, what about stratifiedobs
and slidingwindow
?
Were they explicitly excluded on purpose?
Thanks
not really, we just didn't port code that we weren't sure was going to be useful. I think stratifiedobs
should go in, less sure of slidingwindow
but didn't look much into it and alternatives in the ecosystem.
Just to clarify, I came here specifically for missing stratifiedobs
.
It is needed to replicate the behaviour of Python's sklearn.model_selection.train_test_split([...] stratify=true)
Asked about slidingwindow
on the way, since it was the only other one unchecked but not in the list of explicitly "not to be ported", but I don't have any use-case for it.
I use slidingwindow
often for time series data. Haven't looked too much for a replacement but the closest I've found is IterTools.jl partition. It has a similar interface but returns a tuple iterator
Also found DSP.Periodograms.arraysplit
which is similar to slidingwindow
but you set the overlap instead of the stride. So far slidingwindow
is the fastest of the three because it returns views.