Ray Data - Glob/wildcard in file path
Description
Add the ability to use widcards in the file path for a dataset. I use this daily in spark.
Use case
I have prefixes in s3 with 10ks of files. When testing, I often work with a subset of these files before creating a job to process the entire prefex. To achieve this, I would like to be able to use a wildcard.
Example:
s3://my_data/part-00000.
In order to select ~100 files, I should be able to give a pattern something like: s3://my_data/part-000*.json.snappy
This P2 issue has seen no activity in the past 2 years. It will be closed in 2 weeks as part of ongoing cleanup efforts.
Please comment and remove the pending-cleanup label if you believe this issue should remain open.
Thanks for contributing to Ray!