MLDataPattern.jl
MLDataPattern.jl copied to clipboard
StratifiedBatches for labeled data
trafficstars
Would be nice to have a new data iterator that samples the given data in such a way, that each iteration a batch is returned that has approximately the same class distribution as the given data container.
X = rand(2, 6) # some features
y = [:a, :a, :a, :a, :b, :b]
for (xbatch, ybatch) in StratifiedBatches((X, y), size = 3, count = 10)
# ybatch is always either [:a, :a, :b] or [:a, :b, :a] or [:b, :a, :a]
end
- see
RandomBatchesfor an example of a batch iterator https://github.com/JuliaML/MLDataPattern.jl/blob/master/src/dataiterator.jl - see
stratifiedobsfor an example of how to compute the indices of the observations that belong to each class (look howlabelmapis used)