hf-mirror-site
hf-mirror-site copied to clipboard
下载“allenai/c4”数据集出错
感谢提供这么方便的镜像。我想下载“allenai/c4”但遇到问题,麻烦看一下。复现方法如下:
编写脚本
# download_allenai_c4.py
from datasets import load_dataset
ds = load_dataset('allenai/c4', data_files={'validation': 'en/c4-validation.00003-of-00008.json.gz'}, split='validation', download_mode="force_redownload")
执行
export HF_ENDPOINT=https://hf-mirror.com
python download_allenai_c4.py
出错
Downloading readme: 41.1kB [00:00, 41.1MB/s]
Traceback (most recent call last):
File ".\demo.py", line 8, in <module>
ds = load_dataset('allenai/c4', data_files={'validation': 'en/c4-validation.00003-of-00008.json.gz'}, split='validation', download_mode="force_redownload")
File "D:\SoftwareInstall\Python\lib\site-packages\datasets\load.py", line 2594, in loa builder_instance = load_dataset_builder(
File "D:\SoftwareInstall\Python\lib\site-packages\datasets\load.py", line 2266, in load_dataset_builder
dataset_module = dataset_module_factory(
raise FileNotFoundError(
FileNotFoundError: Couldn't find a dataset script at C:\Users\ZhangGe\Desktop\allenai\c4\c4.py or any data file in the same directory. Couldn't find 'allenai/c4' on the Hugging Face Hub either: FileNotFoundError: Unable to find 'hf://datasets/allenai/c4@1588ec454eed extension ['.csv', '.tsv', '.json', '.jsonl', '.parquet', '.geoparquet', '.gpq', '.arrow', '.txt', '.tar', '.blp', '.bmp', '.dib', '.bufr', '.cur', '.pcx', '.dcx', '.dds', '.ps', '.eps', '.fit', '.fits', '.fli', '.flc', '.ftc', '.ftu', '.gbr', '.gif', '.grib',
'.h5', '.hdf', '.png', '.apng', '.jp2', '.j2k', '.jpc', '.jpf', '.jpx', '.j2c', '.icns',pm', '.BLP', '.BMP', '.DIB', '.BUFR', '.CUR', '.PCX', '.DCX', '.DDS', '.PS', '.EPS', '.FIT', '.FITS', '.FLI', '.FLC', '.FTC', '.FTU', '.GBR', '.GIF', '.GRIB', '.H5', '.HDF', '.PNG', '.APNG', '.JP2', '.J2K', '.JPC', '.JPF', '.JPX', '.J2C', '.ICNS', '.ICO', '.IM', '.IIM', '.TIF', '.TIFF', '.JFIF', '.JPE', '.JPG', '.JPEG', '.MPG', '.MPEG', '.MSP', '.PCD', '.PXR', '.PBM', '.PGM', '.PPM', '.PNM', '.PSD', '.BW', '.RGB', '.RGBA', '.SGI', '.RAS', '.TGA', '.ICB', '.VDA', '.VST', '.WEBP', '.WMF', '.EMF', '.XBM', '.XPM', '.aiff', '.au', '.avr', '.caf', '.flac', '.htk', '.svx', '.mat4', '.mat5', '.mpc2k', '.ogg', '.paf', '.pvf', '.raw', '.rf64', '.sd2', '.sds', '.ircam', '.voc', '.w64', '.wav', '.nist', '.wavex', '.wve', '.xi', '.mp3', '.opus', '.AIFF', '.AU', '.AVR', '.CAF', '.FLAC', '.HTK',
'.SVX', '.MAT4', '.MAT5', '.MPC2K', '.OGG', '.PAF', '.PVF', '.RAW', '.RF64', '.SD2', '.SDS', '.IRCAM', '.VOC', '.W64', '.WAV', '.NIST', '.WAVEX', '.WVE', '.XI', '.MP3', '.OPUS', '.zip']