MLUtils.jl Status of MLDataPattern porting

A list of what is currently exported from MLDataPattern.jl.

TO PORT

[x] getobs, getobs! and nobs. #1
- nobs is now numobs;
- obsdim argument is dropped from the interface
[x] randobs #1
[x] datasubset, DataSubset #4
[x] shuffleobs #5
[x] splitobs #5
[x] DataView #5
- [ ] Consider removal
[x] obsview, ObsView #5
- [ ] Consider removal #8
[x] batchview, BatchView #6
[x] batchsize #6
[ ] slidingwindow, SlidingWindow
[ ] stratifiedobs
[x] oversample, undersample #10
[x] kfolds #9
[x] leaveout #9
[x] eachobs #9
[x] eachbatch #9

NOT TO BE PORTED

BufferGetObs
RandomObs, RandomBatches
BalancedObs
FoldView
targets
eachtarget

Dec 27 '21 11:12 CarloLucibello

We can consider this essentially done

Jan 30 '22 09:01 CarloLucibello

Hi, what about stratifiedobs and slidingwindow? Were they explicitly excluded on purpose? Thanks

Aug 04 '22 21:08 rmkn85

not really, we just didn't port code that we weren't sure was going to be useful. I think stratifiedobs should go in, less sure of slidingwindow but didn't look much into it and alternatives in the ecosystem.

Aug 05 '22 07:08 CarloLucibello

Just to clarify, I came here specifically for missing stratifiedobs. It is needed to replicate the behaviour of Python's sklearn.model_selection.train_test_split([...] stratify=true)

Asked about slidingwindow on the way, since it was the only other one unchecked but not in the list of explicitly "not to be ported", but I don't have any use-case for it.

Aug 05 '22 07:08 rmkn85

I use slidingwindow often for time series data. Haven't looked too much for a replacement but the closest I've found is IterTools.jl partition. It has a similar interface but returns a tuple iterator

Aug 11 '22 09:08 kpa28-git

Also found DSP.Periodograms.arraysplit which is similar to slidingwindow but you set the overlap instead of the stride. So far slidingwindow is the fastest of the three because it returns views.

Aug 11 '22 09:08 kpa28-git

MLUtils.jl MLUtils.jl copied to clipboard

Status of MLDataPattern porting

TO PORT

NOT TO BE PORTED

MLUtils.jl
MLUtils.jl copied to clipboard