PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

Few-shot EFL -- ValueError: Invalid name "my_dataset". Should be one of ['bustm', 'chid', 'iflytek', 'tnews', 'eprstmt', 'ocnli', 'csldcp', 'cluewsc', 'csl'].

Open laurenceandrews opened this issue 2 years ago • 3 comments

I have successfully got the example running using several of the FewCLUE datasets via. Google Colab and am now attempting to run the few-shot EFL example using my own dataset. I'm doing this by hosting it locally within the dataset folder, and updating the fewclue.py BUILDER_CONFIGS, data.py Processor methods, task_label_description.py, predict.py and train.py files accordingly. I have ensured my dataset is made up of JSON files similar to the files provided in FewCLUE.

Unfortunately, when I attempt to import the new dataset using the below code:

from paddlenlp.datasets import load_dataset train_ds, dev_ds, public_test_ds = load_dataset("fewclue", name="my_dataset", splits=("train_0", "dev_0", "test_public"))

I receive the following error message:

`---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-20-3178f63bfe48>](https://localhost:8080/#) in <module>()
      2 
      3 # Load the dataset in FewCLUE with one click by specifying "fewclue" and the dataset name
----> 4 train_ds, dev_ds, public_test_ds = load_dataset("fewclue", name="my_dataset", splits=("train_0", "dev_0", "test_public"))

[/usr/local/lib/python3.7/dist-packages/paddlenlp/datasets/dataset.py](https://localhost:8080/#) in load_dataset(path_or_read_func, name, data_files, splits, lazy, **kwargs)
    214                     raise ValueError(
    215                         'Invalid name "{}". Should be one of {}.'.format(
--> 216                             name, list(reader_cls.BUILDER_CONFIGS.keys())))
    217             elif hasattr(reader_instance, 'SPLITS'):
    218                 split_names = reader_instance.SPLITS.keys()

ValueError: Invalid name "my_dataset". Should be one of ['bustm', 'chid', 'iflytek', 'tnews', 'eprstmt', 'ocnli', 'csldcp', 'cluewsc', 'csl'].`

I'm unsure where I can update these BUILDER_CONFIGS other than the fewclue file which I've already updated. I'm also unsure where this usr/local/lib/python3.7/dist-packages/paddlenlp/datasets/dataset.py file is coming from, as I'm working in Google Colab.


Specs

PaddleNLP version: 2.3.4 PaddlePaddle version: gpu-2.3.1 System: MacOS but running on a GPU via. Google Colab

Traceback (most recent call last): File "/content/PaddleNLP/examples/few_shot/efl/train.py", line 290, in do_train() File "/content/PaddleNLP/examples/few_shot/efl/train.py", line 133, in do_train splits=("train_0", File "/usr/local/lib/python3.7/dist-packages/paddlenlp/datasets/dataset.py", line 216, in load_dataset name, list(reader_cls.BUILDER_CONFIGS.keys()))) ValueError: Invalid name "kami". Should be one of ['bustm', 'chid', 'iflytek', 'tnews', 'eprstmt', 'ocnli', 'csldcp', 'cluewsc', 'csl']. INFO 2022-07-25 21:28:11,020 launch_utils.py:343] terminate all the procs

laurenceandrews avatar Jul 25 '22 22:07 laurenceandrews

Please ensure your Class FewCLUE's class member BUILDER_CONFIGS in fewclue.py has the key your dataset_name. Just like this image

tianxin1860 avatar Jul 26 '22 06:07 tianxin1860

Thanks for your reply. I do currently have it shown here, above bustm. Based on the screenshots below, does it look like this is in the correct location? Are BUILDER_CONFIGS set up in any other file?

Screenshot 2022-07-26 at 10 28 25 Screenshot 2022-07-26 at 10 29 44

laurenceandrews avatar Jul 26 '22 09:07 laurenceandrews

Please ensure your Class FewCLUE's class member BUILDER_CONFIGS in fewclue.py has the key your dataset_name. Just like this image

Just realised I didn't quote reply before. Please see my comment above!

It's very strange, because the order that the datasets are listed in the error message is the same order that the BUILDER_CONFIGS are listed in the fewclue.py file, so presumably that is where it's coming from.

I have the feeling there's another version of this file that's overwriting the correct one for some reason. Perhaps in the "/usr/local/lib/python3.7/dist-packages/paddlenlp/datasets/" path which is referenced in the error message, but I don't know where this path has come from or why this is being used in the first place.

laurenceandrews avatar Jul 27 '22 16:07 laurenceandrews

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] avatar Dec 08 '22 06:12 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

github-actions[bot] avatar Dec 22 '22 16:12 github-actions[bot]