ray icon indicating copy to clipboard operation
ray copied to clipboard

[Data] Infer the data schema in Ray Datasets

Open zhe-thoughts opened this issue 2 years ago • 1 comments

Description

Right now, with "strict mode" enabled by default, users need to care about using the right schema when they passing in functions into map or map_batches

Ideally, we should infer the schema (I guess we already do that in ray.data.read_xxx calls). Another reference is Dask DataFrame: https://docs.dask.org/en/stable/generated/dask.dataframe.DataFrame.apply.html

Use case

This will simplify users' mental model in doing batch processing with Ray Data

zhe-thoughts avatar May 10 '23 21:05 zhe-thoughts

This P2 issue has seen no activity in the past 2 years. It will be closed in 2 weeks as part of ongoing cleanup efforts.

Please comment and remove the pending-cleanup label if you believe this issue should remain open.

Thanks for contributing to Ray!

cszhu avatar Jun 17 '25 00:06 cszhu