NeMo-Curator
NeMo-Curator copied to clipboard
Raise error for DocumentDataset if input path is empty
trafficstars
Currently the error is raised at the dask layer which is not helpful the user
\\\\\\"/opt/NeMo-Text-Curator/nemo_curator/datasets/doc_dataset.py\\\\\\", line 220, in read_custom\\\\n read_data(\\\\n File \\\\\\"/opt/NeMo-Text-Curator/nemo_curator/utils/distributed_utils.py\\\\\\", line 604, in
read_data\\\\n return read_data_files_per_partition(\\\\n File \\\\\\"/opt/NeMo-Text-Curator/nemo_curator/utils/distributed_utils.py\\\\\\", line 514, in read_data_files_per_partition\\\\n output = dd.from_map(\\\\n File
\\\\\\"/nemo_curator/conda_envs/envs/text_curator/lib/python3.10/site-packages/dask_expr/_collection.py\\\\\\", line 5989, in from_map\\\\n raise ValueError(\\\\\\"All `iterables` must have a non-zero
length\\\\\\")\\\\nValueError: All `iterables` must have a non-zero length\\\\n\\"}"}']