AdaSeq [Question]How to solve [datasets.builder.DatasetGenerationError: An error occurred while generating the dataset]

What is your question?

Traceback (most recent call last):
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1618, in _prepare_split_single writer = writer_class( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\arrow_writer.py", line 334, in init self.stream = self._fs.open(fs_token_paths[2][0], "wb") File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\spec.py", line 1309, in open f = self._open( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 180, in _open return LocalFileOpener(path, mode, fs=self, **kwargs) File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 298, in init self._open() File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 303, in _open self.f = open(self.path, mode=self.mode) FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/shawn/.cache/huggingface/datasets/named_entity_recognition_dataset_builder/default-c270794ce0d 23d06/0.0.0/db737b9bb893f20fb03d04403a30bf7c033256c212b7e9f0ebc6e9c958535c51.incomplete/named_entity_recognition_dataset_builder-train-00000-00000-of-NNNNN.arro w'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\shawn\anaconda3\envs\pytorch\lib\runpy.py", line 197, in _run_module_as_main return run_code(code, main_globals, None, File "C:\Users\shawn\anaconda3\envs\pytorch\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\shawn\anaconda3\envs\pytorch\Scripts\adaseq.exe_main.py", line 7, in File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\main.py", line 13, in run main(prog='adaseq') File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands_init.py", line 29, in main args.func(args) File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 84, in train_model_from_args train_model( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 156, in train_model trainer = build_trainer_from_partial_objects( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 185, in build_trainer_from_partial_objects dm = DatasetManager.from_config(task=config.task, **config.dataset) File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\data\dataset_manager.py", line 182, in from_config hfdataset = hf_load_dataset(path, name=name, **kwargs) File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\load.py", line 1797, in load_dataset builder_instance.download_and_prepare( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 909, in download_and_prepare self._download_and_prepare( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1670, in _download_and_prepare super()._download_and_prepare( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1004, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1508, in _prepare_split for job_id, done, content in self._prepare_split_single( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1665, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

What have you tried?

set http proxy and successfully conneted to Youtube.

Code (if necessary)

No response

What's your environment?

AdaSeq Version (e.g., 1.0 or master):
ModelScope Version (e.g., 1.0 or master):
PyTorch Version (e.g., 1.12.1):
OS (e.g., Ubuntu 20.04):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Oct 23 '23 02:10 Shawnzheng011019

environment was set automatically by the file requiremets.txt

Oct 23 '23 02:10 Shawnzheng011019

同样遇到这个问题，看起来应该是adaseq加载数据集的时候，可能处理逻辑有问题，加载数据集的格式

···text data_type: json_spans ···

可能有点问题

Dec 16 '23 17:12 ykallan

是因为数据集找不到或者数据集不是标准的解析格式，可以按照toy msra的加载代码重写一下数据加载

Mar 12 '24 13:03 PPPP-kaqiu

@PPPP-kaqiu 你重新写了吗？可以分享一下吗

Mar 19 '24 09:03 houyuchao

@Shawnzheng011019 请问解决了吗，大哥

Apr 26 '24 09:04 lichen146

完全按照hf dataset的格式写数据加载脚本，yaml的数据加载就只写数据那个文件夹就好了

Apr 26 '24 09:04 PPPP-kaqiu

@PPPP-kaqiu 加个微信吧大哥，求教啊WX：Xugeyuan923

Apr 26 '24 09:04 lichen146

完全按照hf dataset的格式写数据加载脚本，yaml的数据加载就只写数据那个文件夹就好了

大神您好可以分享一下怎么解决的吗

Jul 21 '24 09:07 houyuchao

AdaSeq AdaSeq copied to clipboard

[Question]How to solve [datasets.builder.DatasetGenerationError: An error occurred while generating the dataset]

What is your question?

What have you tried?

Code (if necessary)

What's your environment?

Code of Conduct

AdaSeq
AdaSeq copied to clipboard