litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

Continue pre-training got RuntimeError: Failed processing /tmp/data

Open BestJiayi opened this issue 1 year ago • 5 comments

How can I solve this problem?

I download pythia-160m from hugging face.
My data was downloaded according to official documents. This program is running in NVIDIA docker.

I followed the official documentation and continued pre-training, but an error occurred: RuntimeError: Failed processing /tmp/data.

litgpt pretrain \
  --model_name pythia-160m \
  --tokenizer_dir checkpoints/EleutherAI/pythia-160m \
  --initial_checkpoint_dir checkpoints/EleutherAI/pythia-160m \
  --data TextFiles \
  --data.train_data_path "custom_texts" \
  --out_dir out/custom_model

I got:

RuntimeError:
We found the following error Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 628, in _handle_data_chunk_recipe
    for item_data in item_data_or_generator:
  File "/usr/local/lib/python3.10/dist-packages/litdata/processing/functions.py", line 151, in _prepare_item_generator
    yield from self._fn(item_metadata)  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/litgpt/data/text_files.py", line 124, in tokenize
    with open(filename, "r", encoding="utf-8") as file:
IsADirectoryError: [Errno 21] Is a directory: '/tmp/data'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 423, in run
    self._loop()
  File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 472, in _loop
    self._handle_data_chunk_recipe(index)
  File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 638, in _handle_data_chunk_recipe
    raise RuntimeError(f"Failed processing {self.items[index]}") from e
RuntimeError: Failed processing /tmp/data

BestJiayi avatar May 13 '24 10:05 BestJiayi

Same issue as in https://github.com/Lightning-AI/litgpt/issues/1402

cc @awaelchli

carmocca avatar May 13 '24 11:05 carmocca

@carmocca Please tell me, if I want to continue using litgpt for pre-training, what should I do? Should we wait until the bug is fixed before using litgpt? thank you!

BestJiayi avatar May 13 '24 12:05 BestJiayi

Are you using Google Colab? You could try using https://lightning.ai while this gets fixed. It should work there without issues

carmocca avatar May 13 '24 12:05 carmocca

Thank you, I am currently using litgpt under our company's gpu cluster. I will wait for the issue to be fixed before continuing to use it.

BestJiayi avatar May 14 '24 02:05 BestJiayi