ray icon indicating copy to clipboard operation
ray copied to clipboard

[WIP] Dataset iterator

Open stephanie-wang opened this issue 2 years ago • 3 comments

Why are these changes needed?

TODO: Docs and deprecation warnings for APIs.

Related issue number

Checks

  • [ ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

stephanie-wang avatar Dec 19 '22 22:12 stephanie-wang

@amogkam @clarkzinzow @bveeramani assigning to you all to review the API!

matthewdeng avatar Dec 20 '22 21:12 matthewdeng

Are we still planning to remove iter_batches from the base Dataset API and instead have ds.to_iterator().iter_batches()?

Basically avoiding duplicate APIs in both Dataset and DatasetIterator

amogkam avatar Dec 20 '22 22:12 amogkam

Are we still planning to remove iter_batches from the base Dataset API and instead have ds.to_iterator().iter_batches()?

Basically avoiding duplicate APIs in both Dataset and DatasetIterator

Eventually yes, but my opinion on this is that we should wait for at least a full release cycle so we can first stabilize the DatasetIterator API.

stephanie-wang avatar Dec 20 '22 22:12 stephanie-wang