nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

Issue with running prepare.py

Open torial opened this issue 2 years ago • 3 comments

I received the following error while running python prepare.py:

`Traceback (most recent call last): File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1570, in _prepare_split_single for key, record in generator: File "C:\Users\fresh.cache\huggingface\modules\datasets_modules\datasets\openwebtext\85b3ae7051d2d72e7c5fdf6dfb462603aaa26e9ed506202bf3a24d261c6c40a1\openwebtext.py", line 85, in _generate_examples with open(filepath, encoding="utf-8") as f: File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\streaming.py", line 69, in wrapper return function(*args, use_auth_token=use_auth_token, **kwargs) File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\download\streaming_download_manager.py", line 445, in xopen return open(main_hop, mode, *args, **kwargs) OSError: [Errno 22] Invalid argument: 'C:\Users\fresh\.cache\huggingface\datasets\downloads\extracted\f03a89c11b1133c3973ac7aed71b6be5c62feb33c5ec06cffb06511974f7194e\001 5896-b1054262f7da52a0518521e29c8e352c.txt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\fresh\Downloads\nanoGPT\data\openwebtext\prepare.py", line 14, in dataset = load_dataset("openwebtext") File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\load.py", line 1757, in load_dataset builder_instance.download_and_prepare( File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 860, in download_and_prepare self._download_and_prepare( File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1611, in _download_and_prepare super()._download_and_prepare( File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 953, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1449, in _prepare_split for job_id, done, content in self._prepare_split_single( File "C:\Users\fresh\AppData\Roaming\Python\Python39\site-packages\datasets\builder.py", line 1606, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.builder.DatasetGenerationError: An error occurred while generating the dataset `

torial avatar Jan 13 '23 17:01 torial

I ran into some similar issues with prepare.py as well. Dunno if this is your case, I solved it by setting num_proc to 1 in line 11. Hope this helps.

lowkick avatar Jan 14 '23 14:01 lowkick

Unfortunately that didn't help, but I appreciate the suggestion.

torial avatar Jan 14 '23 19:01 torial

I also got this error when running on windows, I had to turn off windows defender and re download the dataset to get pass this

PhillzMike avatar Jan 16 '23 19:01 PhillzMike