PaddleNLP
PaddleNLP copied to clipboard
Few-shot EFL -- ValueError: Invalid name "my_dataset". Should be one of ['bustm', 'chid', 'iflytek', 'tnews', 'eprstmt', 'ocnli', 'csldcp', 'cluewsc', 'csl'].
I have successfully got the example running using several of the FewCLUE datasets via. Google Colab and am now attempting to run the few-shot EFL example using my own dataset. I'm doing this by hosting it locally within the dataset folder, and updating the fewclue.py BUILDER_CONFIGS, data.py Processor methods, task_label_description.py, predict.py and train.py files accordingly. I have ensured my dataset is made up of JSON files similar to the files provided in FewCLUE.
Unfortunately, when I attempt to import the new dataset using the below code:
from paddlenlp.datasets import load_dataset train_ds, dev_ds, public_test_ds = load_dataset("fewclue", name="my_dataset", splits=("train_0", "dev_0", "test_public"))
I receive the following error message:
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-20-3178f63bfe48>](https://localhost:8080/#) in <module>()
2
3 # Load the dataset in FewCLUE with one click by specifying "fewclue" and the dataset name
----> 4 train_ds, dev_ds, public_test_ds = load_dataset("fewclue", name="my_dataset", splits=("train_0", "dev_0", "test_public"))
[/usr/local/lib/python3.7/dist-packages/paddlenlp/datasets/dataset.py](https://localhost:8080/#) in load_dataset(path_or_read_func, name, data_files, splits, lazy, **kwargs)
214 raise ValueError(
215 'Invalid name "{}". Should be one of {}.'.format(
--> 216 name, list(reader_cls.BUILDER_CONFIGS.keys())))
217 elif hasattr(reader_instance, 'SPLITS'):
218 split_names = reader_instance.SPLITS.keys()
ValueError: Invalid name "my_dataset". Should be one of ['bustm', 'chid', 'iflytek', 'tnews', 'eprstmt', 'ocnli', 'csldcp', 'cluewsc', 'csl'].`
I'm unsure where I can update these BUILDER_CONFIGS other than the fewclue file which I've already updated. I'm also unsure where this usr/local/lib/python3.7/dist-packages/paddlenlp/datasets/dataset.py file is coming from, as I'm working in Google Colab.
Specs
PaddleNLP version: 2.3.4 PaddlePaddle version: gpu-2.3.1 System: MacOS but running on a GPU via. Google Colab
Traceback (most recent call last):
File "/content/PaddleNLP/examples/few_shot/efl/train.py", line 290, in
Please ensure your Class FewCLUE's
class member BUILDER_CONFIGS
in fewclue.py has the key your dataset_name
. Just like this
Thanks for your reply. I do currently have it shown here, above bustm. Based on the screenshots below, does it look like this is in the correct location? Are BUILDER_CONFIGS set up in any other file?


Please ensure your Class
FewCLUE's
class memberBUILDER_CONFIGS
in fewclue.py has the keyyour dataset_name
. Just like this
Just realised I didn't quote reply before. Please see my comment above!
It's very strange, because the order that the datasets are listed in the error message is the same order that the BUILDER_CONFIGS are listed in the fewclue.py file, so presumably that is where it's coming from.
I have the feeling there's another version of this file that's overwriting the correct one for some reason. Perhaps in the "/usr/local/lib/python3.7/dist-packages/paddlenlp/datasets/" path which is referenced in the error message, but I don't know where this path has come from or why this is being used in the first place.
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。