DataLoaders.jl icon indicating copy to clipboard operation
DataLoaders.jl copied to clipboard

Rename/PR to MLDataPattern.jl

Open darsnack opened this issue 4 years ago • 1 comments

Great work!

I've been working on a similar idea, and I was wondering if you would consider making this work a PR to MLDataPattern.jl? The key feature here is the async iterator, and I was planning on adding such an iterator to MLDataPattern.jl. Features like collation and batching can be done as modifications to the existing BatchView in MLDataPattern.jl.

darsnack avatar Aug 29 '20 12:08 darsnack

Reposting here an adapted version from our Zulip convo:

Would definitely consider it. I've tried to write DataLoaders.jl in similarly composable pieces as MLDataPattern.jl. The DataLoader interface is more like a thin wrapper around the following pieces:

  • batchviewcollated: like MLDataPattern.batchview, but collates the batches while still supporting getobs!
  • GetObsAsync: makes a data iterator from your data container, but loads samples off the main thread with multiple workers.
  • BufferGetObsAsync: like MLDataPattern.eachobs, but loads data in parallel as the above. supports inplace getobs! with a ring buffer.

I would like to see if the functionality is stable before merging it into MLDataPattern.jl, since I'm not great when it comes to parallel programming and there might be some subtle bugs still lurking in the code.

lorenzoh avatar Aug 31 '20 08:08 lorenzoh