datasets
datasets copied to clipboard
The speechocean762 dataset
speechocean762 is a non-native English corpus for pronunciation scoring tasks. It is free for both commercial and non-commercial use.
I believe it will be easier to use if it could be available on Hugging Face.
CircleCL reported two errors, but I didn't find the reason. The error message:
_________________ ERROR collecting tests/test_dataset_cards.py _________________
tests/test_dataset_cards.py:53: in <module>
@pytest.mark.parametrize("dataset_name", get_changed_datasets(repo_path))
tests/test_dataset_cards.py:35: in get_changed_datasets
diff_output = check_output(["git", "diff", "--name-only", "origin/master...HEAD"], cwd=repo_path)
../.pyenv/versions/3.6.15/lib/python3.6/subprocess.py:356: in check_output
**kwargs).stdout
../.pyenv/versions/3.6.15/lib/python3.6/subprocess.py:438: in run
output=stdout, stderr=stderr)
E subprocess.CalledProcessError: Command '['git', 'diff', '--name-only', 'origin/master...HEAD']' returned non-zero exit status 128.
=========================== short test summary info ============================
ERROR tests/test_dataset_cards.py - subprocess.CalledProcessError: Command '[...
ERROR tests/test_dataset_cards.py - subprocess.CalledProcessError: Command '[...
= 4011 passed, 2357 skipped, 2 xfailed, 1 xpassed, 116 warnings, 2 errors in 284.32s (0:04:44) =
Exited with code exit status 1
I'm not sure if it was caused by this PR ...
I ran tests/test_dataset_cards.py
in my local environment, and it passed:
(venv)$ pytest tests/test_dataset_cards.py
============================== test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/zhangjunbo/src/datasets
plugins: forked-1.4.0, datadir-1.3.1, xdist-2.5.0
collected 1531 items
tests/test_dataset_cards.py ..... [100%]
======================= 766 passed, 765 skipped in 2.55s ========================
@sanchit-gandhi could you also maybe take a quick look? :-)
Thanks for your contribution, @jimbozhang. Are you still interested in adding this dataset?
We are removing the dataset scripts from this GitHub repo and moving them to the Hugging Face Hub: https://huggingface.co/datasets
We would suggest you create this dataset there. Please, feel free to tell us if you need some help.
Thanks for your contribution, @jimbozhang. Are you still interested in adding this dataset?
We are removing the dataset scripts from this GitHub repo and moving them to the Hugging Face Hub: https://huggingface.co/datasets
We would suggest you create this dataset there. Please, feel free to tell us if you need some help.
Yes, I just planned to finish this dataset these days, and this suggestion is just in time! Thanks a lot! I will create this dataset to Hugging Face Hub soon, maybe this week.