datasets The speechocean762 dataset

speechocean762 is a non-native English corpus for pronunciation scoring tasks. It is free for both commercial and non-commercial use.

I believe it will be easier to use if it could be available on Hugging Face.

Jul 06 '22 06:07 jimbozhang

CircleCL reported two errors, but I didn't find the reason. The error message:

_________________ ERROR collecting tests/test_dataset_cards.py _________________
tests/test_dataset_cards.py:53: in <module>
    @pytest.mark.parametrize("dataset_name", get_changed_datasets(repo_path))
tests/test_dataset_cards.py:35: in get_changed_datasets
    diff_output = check_output(["git", "diff", "--name-only", "origin/master...HEAD"], cwd=repo_path)
../.pyenv/versions/3.6.15/lib/python3.6/subprocess.py:356: in check_output
    **kwargs).stdout
../.pyenv/versions/3.6.15/lib/python3.6/subprocess.py:438: in run
    output=stdout, stderr=stderr)
E   subprocess.CalledProcessError: Command '['git', 'diff', '--name-only', 'origin/master...HEAD']' returned non-zero exit status 128.

=========================== short test summary info ============================
ERROR tests/test_dataset_cards.py - subprocess.CalledProcessError: Command '[...
ERROR tests/test_dataset_cards.py - subprocess.CalledProcessError: Command '[...
= 4011 passed, 2357 skipped, 2 xfailed, 1 xpassed, 116 warnings, 2 errors in 284.32s (0:04:44) =

Exited with code exit status 1

I'm not sure if it was caused by this PR ...

I ran tests/test_dataset_cards.py in my local environment, and it passed:

(venv)$ pytest tests/test_dataset_cards.py
============================== test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/zhangjunbo/src/datasets
plugins: forked-1.4.0, datadir-1.3.1, xdist-2.5.0
collected 1531 items

tests/test_dataset_cards.py ..... [100%]
======================= 766 passed, 765 skipped in 2.55s ========================

Jul 07 '22 12:07 jimbozhang

@sanchit-gandhi could you also maybe take a quick look? :-)

Jul 22 '22 16:07 patrickvonplaten

Thanks for your contribution, @jimbozhang. Are you still interested in adding this dataset?

We are removing the dataset scripts from this GitHub repo and moving them to the Hugging Face Hub: https://huggingface.co/datasets

We would suggest you create this dataset there. Please, feel free to tell us if you need some help.

Sep 30 '22 14:09 albertvillanova

Thanks for your contribution, @jimbozhang. Are you still interested in adding this dataset?

We are removing the dataset scripts from this GitHub repo and moving them to the Hugging Face Hub: https://huggingface.co/datasets

We would suggest you create this dataset there. Please, feel free to tell us if you need some help.

Yes, I just planned to finish this dataset these days, and this suggestion is just in time! Thanks a lot! I will create this dataset to Hugging Face Hub soon, maybe this week.

Sep 30 '22 22:09 jimbozhang

datasets datasets copied to clipboard

The speechocean762 dataset

datasets
datasets copied to clipboard