[Data] Infer the data schema in Ray Datasets
Description
Right now, with "strict mode" enabled by default, users need to care about using the right schema when they passing in functions into map or map_batches
Ideally, we should infer the schema (I guess we already do that in ray.data.read_xxx calls). Another reference is Dask DataFrame: https://docs.dask.org/en/stable/generated/dask.dataframe.DataFrame.apply.html
Use case
This will simplify users' mental model in doing batch processing with Ray Data
This P2 issue has seen no activity in the past 2 years. It will be closed in 2 weeks as part of ongoing cleanup efforts.
Please comment and remove the pending-cleanup label if you believe this issue should remain open.
Thanks for contributing to Ray!