MLDataPattern.jl icon indicating copy to clipboard operation
MLDataPattern.jl copied to clipboard

Utility package for subsetting, resampling, iteration, and partitioning of various types of data sets in Machine Learning

Results 16 MLDataPattern.jl issues
Sort by recently updated
recently updated
newest added

This is a WIP in response to https://github.com/JuliaML/LearnBase.jl/pull/44. The changes to the LearnBase.jl interface reduce the MLDataPattern.jl codebase significantly. Most notably, we are able to avoid a lot of the...

This pull request changes the compat entry for the `LearnBase` package from `0.4` to `0.4, 0.5`. This keeps the compat entries for earlier versions. Note: I have not tested your...

The arg `maxsize` was added to `eachbatch` in #9 (in response to #8). I believe `eachbatch` could use one more optional argument: `zero_remainder`. If you set `maxsize` and `zero_remainder =...

Raising this issue prompted by [this Discourse post](https://discourse.julialang.org/t/dataloaders-jl-workers-systematically-end-up-outside-bounds/62928/2) which led me to uncover that `nobs(::BatchView)` returns not the number of batches, but the number of wrapped observations. This is in...

I've been stuck on this for a while now and just traced it back to the behavior of `eachbatch` being different than what I would expect. My data is shown...

It seems if I have a shuffleobs and then wrap it in Flux.DataLoader, doing the `|> gpu` no longer moves the data to gpu :-(

I just implemented `stratifiedkfolds` for my own work. It might be something that could live in this package. I'm not too familiar with the internals of MLDataPattern, so it might...

Currently `splitobs(data, at[, obsdim]) → NTuple` is slow when splitting some data into many many parts. For example, `@time splitobs(rand(100000), at=ntuple(i->1/10001, 10000))` will take forever to run. Proposal: make `at`...

After #9, I was thinking about the ways one can handle non-dividable batch-size. I think MLDataPattern could do with 1 or 2 more. I will enumerate them here for consideration....

This should be supported. ```julia dataset = (x=rand(10), y=rand(2, 10)) for batch in eachbatch(dataset; size=3) # do something with batch.x and batch.y end ```

enhancement
help wanted