ray icon indicating copy to clipboard operation
ray copied to clipboard

[Data] - Reading zipped JSONL files results in error.

Open simonsays1980 opened this issue 5 months ago • 0 comments

What happened + What you expected to happen

What happened

I ran the script below to read in data in zipped JSONL format and ran into this error:

ValueError: No input files found to read with the following file extensions: ['json', 'jsonl']. Please double check that 'file_extensions' field is set properly.

What you expected to happen

That ray.data allows to read in zipped JSONL files out-of-the-box.

Versions / Dependencies

Python 3.7.11 Linux Fedora 40 Ray Master

Reproduction script

Execute this script from your ray root folder.

from pathlib import Path

import ray

base_path = Path(__file__).parent / "rllib"
data_path = base_path / "tests/data/pendulum/enormous.zip"

# Read in the `SampleBatch` data using `ray.data.read_json`.
ds = ray.data.read_json(data_path.as_posix())

Issue Severity

High: It blocks me from completing my task.

simonsays1980 avatar Aug 28 '24 14:08 simonsays1980