MLDataPattern.jl
MLDataPattern.jl copied to clipboard
Utility package for subsetting, resampling, iteration, and partitioning of various types of data sets in Machine Learning
This is a WIP in response to https://github.com/JuliaML/LearnBase.jl/pull/44. The changes to the LearnBase.jl interface reduce the MLDataPattern.jl codebase significantly. Most notably, we are able to avoid a lot of the...
This pull request changes the compat entry for the `LearnBase` package from `0.4` to `0.4, 0.5`. This keeps the compat entries for earlier versions. Note: I have not tested your...
The arg `maxsize` was added to `eachbatch` in #9 (in response to #8). I believe `eachbatch` could use one more optional argument: `zero_remainder`. If you set `maxsize` and `zero_remainder =...
Raising this issue prompted by [this Discourse post](https://discourse.julialang.org/t/dataloaders-jl-workers-systematically-end-up-outside-bounds/62928/2) which led me to uncover that `nobs(::BatchView)` returns not the number of batches, but the number of wrapped observations. This is in...
I've been stuck on this for a while now and just traced it back to the behavior of `eachbatch` being different than what I would expect. My data is shown...
It seems if I have a shuffleobs and then wrap it in Flux.DataLoader, doing the `|> gpu` no longer moves the data to gpu :-(
I just implemented `stratifiedkfolds` for my own work. It might be something that could live in this package. I'm not too familiar with the internals of MLDataPattern, so it might...
Currently `splitobs(data, at[, obsdim]) → NTuple` is slow when splitting some data into many many parts. For example, `@time splitobs(rand(100000), at=ntuple(i->1/10001, 10000))` will take forever to run. Proposal: make `at`...
After #9, I was thinking about the ways one can handle non-dividable batch-size. I think MLDataPattern could do with 1 or 2 more. I will enumerate them here for consideration....
This should be supported. ```julia dataset = (x=rand(10), y=rand(2, 10)) for batch in eachbatch(dataset; size=3) # do something with batch.x and batch.y end ```