AdaSeq
AdaSeq copied to clipboard
[Question]How to solve [datasets.builder.DatasetGenerationError: An error occurred while generating the dataset]
What is your question?
Traceback (most recent call last):
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1618, in _prepare_split_single
writer = writer_class(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\arrow_writer.py", line 334, in init
self.stream = self._fs.open(fs_token_paths[2][0], "wb")
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\spec.py", line 1309, in open
f = self._open(
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 180, in _open
return LocalFileOpener(path, mode, fs=self, **kwargs)
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 298, in init
self._open()
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 303, in _open
self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/shawn/.cache/huggingface/datasets/named_entity_recognition_dataset_builder/default-c270794ce0d
23d06/0.0.0/db737b9bb893f20fb03d04403a30bf7c033256c212b7e9f0ebc6e9c958535c51.incomplete/named_entity_recognition_dataset_builder-train-00000-00000-of-NNNNN.arro
w'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\runpy.py", line 197, in _run_module_as_main
return run_code(code, main_globals, None,
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\shawn\anaconda3\envs\pytorch\Scripts\adaseq.exe_main.py", line 7, in
What have you tried?
set http proxy and successfully conneted to Youtube.
Code (if necessary)
No response
What's your environment?
- AdaSeq Version (e.g., 1.0 or master):
- ModelScope Version (e.g., 1.0 or master):
- PyTorch Version (e.g., 1.12.1):
- OS (e.g., Ubuntu 20.04):
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- Any other relevant information:
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
environment was set automatically by the file requiremets.txt
同样遇到这个问题,看起来应该是adaseq加载数据集的时候,可能处理逻辑有问题,加载数据集的格式
···text data_type: json_spans ···
可能有点问题
是因为数据集找不到或者数据集不是标准的解析格式,可以按照toy msra的加载代码重写一下数据加载
@PPPP-kaqiu 你重新写了吗?可以分享一下吗
@Shawnzheng011019 请问解决了吗,大哥
完全按照hf dataset的格式写数据加载脚本,yaml的数据加载就只写数据那个文件夹就好了
@PPPP-kaqiu 加个微信吧大哥,求教啊WX:Xugeyuan923
完全按照hf dataset的格式写数据加载脚本,yaml的数据加载就只写数据那个文件夹就好了
大神您好可以分享一下怎么解决的吗