MLDataPattern.jl icon indicating copy to clipboard operation
MLDataPattern.jl copied to clipboard

StratifiedBatches for labeled data

Open Evizero opened this issue 8 years ago • 0 comments
trafficstars

Would be nice to have a new data iterator that samples the given data in such a way, that each iteration a batch is returned that has approximately the same class distribution as the given data container.

X = rand(2, 6) # some features
y = [:a, :a, :a, :a, :b, :b]
for (xbatch, ybatch) in StratifiedBatches((X, y), size = 3, count = 10)
    # ybatch is always either [:a, :a, :b] or [:a, :b, :a] or [:b, :a, :a]
end
  • see RandomBatches for an example of a batch iterator https://github.com/JuliaML/MLDataPattern.jl/blob/master/src/dataiterator.jl
  • see stratifiedobs for an example of how to compute the indices of the observations that belong to each class (look how labelmap is used)

Evizero avatar May 02 '17 09:05 Evizero