NeMo-Curator icon indicating copy to clipboard operation
NeMo-Curator copied to clipboard

Raise error for DocumentDataset if input path is empty

Open praateekmahajan opened this issue 8 months ago • 0 comments
trafficstars

Currently the error is raised at the dask layer which is not helpful the user

\\\\\\"/opt/NeMo-Text-Curator/nemo_curator/datasets/doc_dataset.py\\\\\\", line 220, in read_custom\\\\n    read_data(\\\\n  File \\\\\\"/opt/NeMo-Text-Curator/nemo_curator/utils/distributed_utils.py\\\\\\", line 604, in
read_data\\\\n    return read_data_files_per_partition(\\\\n  File \\\\\\"/opt/NeMo-Text-Curator/nemo_curator/utils/distributed_utils.py\\\\\\", line 514, in read_data_files_per_partition\\\\n    output = dd.from_map(\\\\n  File
\\\\\\"/nemo_curator/conda_envs/envs/text_curator/lib/python3.10/site-packages/dask_expr/_collection.py\\\\\\", line 5989, in from_map\\\\n    raise ValueError(\\\\\\"All `iterables` must have a non-zero
length\\\\\\")\\\\nValueError: All `iterables` must have a non-zero length\\\\n\\"}"}']

praateekmahajan avatar Mar 10 '25 20:03 praateekmahajan