replicate-python
replicate-python copied to clipboard
Llama-2 model Training failed with input data
I'm fine tunning Llama-2 13-b model with jsonl file it fails. I've tried with 7b model and I've enabled billing also.
DESTINATION_MODEL_NAME = 'deepakkumar07-debug/llama-midjournery'
TRAINING_DATA_URL = 'https://sangli-training-dataset.s3.ap-south-1.amazonaws.com/midjourney_replicate_dataset.jsonl'
training = replicate.trainings.create(
version='meta/llama-2-13b:078d7a002387bd96d93b0302a4c03b3f15824b63104034bfa943c63a8f208c38',
input={
"train_data": TRAINING_DATA_URL,
},
destination="deepakkumar07-debug/llama-midjournery",
)
print(training)
from replicate console im getting following error
Downloading weights to models/llama-2-13b/model_artifacts/training_weights...
Downloading weights...
Downloading https://weights.replicate.delivery/default/llama-2-13b/model-00001-of-00003.safetensors
Downloading https://weights.replicate.delivery/default/llama-2-13b/model-00002-of-00003.safetensors
Downloading https://weights.replicate.delivery/default/llama-2-13b/model-00003-of-00003.safetensors
Downloading https://weights.replicate.delivery/default/llama-2-13b/config.json
Downloading https://weights.replicate.delivery/default/llama-2-13b/generation_config.json
Downloading https://weights.replicate.delivery/default/llama-2-13b/model.safetensors.index.json
Downloading https://weights.replicate.delivery/default/llama-2-13b/special_tokens_map.json
Downloading https://weights.replicate.delivery/default/llama-2-13b/tokenizer_config.json
Downloading https://weights.replicate.delivery/default/llama-2-13b/tokenizer.json
Downloading https://weights.replicate.delivery/default/llama-2-13b/tokenizer.model
[stdout]
models/llama-2-13b/model_artifacts/training_weights/model.safetensors.index.json took 0.563747s (59324 bytes/sec)
[stdout]
models/llama-2-13b/model_artifacts/training_weights/generation_config.json took 0.573989s (238 bytes/sec)
[stdout]
models/llama-2-13b/model_artifacts/training_weights/special_tokens_map.json took 0.581211s (707 bytes/sec)
[stdout]
models/llama-2-13b/model_artifacts/training_weights/tokenizer_config.json took 0.634014s (1175 bytes/sec)
[stdout]
models/llama-2-13b/model_artifacts/training_weights/config.json took 0.717872s (810 bytes/sec)
[stdout]
models/llama-2-13b/model_artifacts/training_weights/tokenizer.json took 0.831675s (2215729 bytes/sec)
[stdout]
Downloaded 500 kB bytes in 0.894s (559 kB/s)
[stdout]
Downloaded 6.2 GB bytes in 20.206s (306 MB/s)
[stdout]
Downloaded 9.9 GB bytes in 28.330s (350 MB/s)
[stdout]
Downloaded 9.9 GB bytes in 28.599s (348 MB/s)
Finished download in 32.95s
Local Output Dir: training_output
Number of GPUs: 8
Train.py Arguments:
['python3', '-m', 'torch.distributed.run', '--nnodes=1', '--nproc_per_node=8', 'llama_recipes/llama_finetuning.py', '--enable_fsdp', '--use_peft', '--model_name=models/llama-2-13b/model_artifacts/training_weights', '--pure_bf16', '--output_dir=training_output', '--pack_sequences=False', '--wrap_packed_sequences=False', '--chunk_size=2048', '--data_path=/tmp/tmpn81ez7iqcode_review_dataset.jsonl', '--num_epochs=1', '--batch_size_training=4', '--gradient_accumulation_steps=1', '--lr=0.0001', '--lora_rank=8', '--lora_alpha=16', '--lora_dropout=0.05', '--peft_method=lora', '--run_validation=True', '--num_validation_samples=50', '--validation_data_path=None', '--val_batch_size=1', '--validation_prompt=None', '--seed=42']
WARNING:__main__:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Selecting observations 0 through -39 from data for training...
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset_train = get_preprocessed_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
return DATASET_PREPROC[dataset_config.dataset](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
dataset = format_data(dataset, tokenizer, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
Selecting observations 0 through -39 from data for training...
Selecting observations 0 through -39 from data for training...
Selecting observations 0 through -39 from data for training...
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^ ^^fire.Fire(main) fire.Fire(main)^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
component, remaining_args = _CallAndUpdateTrace(
^^^ ^ ^ ^ ^ ^ ^ ^component_trace = _Fire(component, args, parsed_flag_args, context, name) ^
^ ^ ^ ^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^ ^^ ^ ^ ^^ ^ ^ ^^ ^ ^^ ^ ^ ^ ^ ^ ^^ ^^^ ^^^ ^^^ ^^
^^ ^ File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
^ ^^^ ^^ ^^ ^ ^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset_train = get_preprocessed_dataset(
component, remaining_args = _CallAndUpdateTrace(^ ^component = fn(*varargs, **kwargs)
^
^^^^^^^ ^ ^^^ ^^^ ^ ^^ ^ ^ ^ ^ ^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
^ ^^ ^ ^^^ ^^ ^^ ^ ^^ ^^ ^^ ^^^ ^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
^^^^^^^^^^^^^ ^return DATASET_PREPROC[dataset_config.dataset](
^ ^ ^ ^ ^ ^^^^
^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
^^ ^dataset_train = get_preprocessed_dataset(^^
^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
^^^^ File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
dataset = format_data(dataset, tokenizer, config)
return DATASET_PREPROC[dataset_config.dataset](
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^ File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
dataset = format_data(dataset, tokenizer, config)
if "text" in dataset[0]:
component = fn(*varargs, **kwargs)
^ ^^~^~^^~^ ^^ ~^~ ^~^ ~^ ^^ ^^^ ^^
^ ^^ ^^ File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
^^ ^^ ^^ ^^ ^^ ^^^^^^^^^^^
^^ File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
^^^^^^^^^^^^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
dataset_train = get_preprocessed_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
return DATASET_PREPROC[dataset_config.dataset](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
dataset = format_data(dataset, tokenizer, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
Selecting observations 0 through -39 from data for training...
Selecting observations 0 through -39 from data for training...
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^fire.Fire(main)^
^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
component, remaining_args = _CallAndUpdateTrace(
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^ File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(component = fn(*varargs, **kwargs)
^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^ File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset_train = get_preprocessed_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
return DATASET_PREPROC[dataset_config.dataset](component = fn(*varargs, **kwargs)
^^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^ File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset = format_data(dataset, tokenizer, config)
dataset_train = get_preprocessed_dataset(
^^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^ File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
return DATASET_PREPROC[dataset_config.dataset](
^^^^^^^^ ^if "text" in dataset[0]:^
^^^^^^^^^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
dataset = format_data(dataset, tokenizer, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
return self._getitem(key)
return self._getitem(key)return self._getitem(key)
return self._getitem(key)
return self._getitem(key)
^ ^ ^return self._getitem(key) ^ ^
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^ ^ ^ ^
^ ^^^ ^^ File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
^^^ ^^ ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
^^^^^^^^^
^^^^^^^ File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
^^
^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
^^^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)^^
^^^pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^ File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
^^^^^^^ ^ ^ ^^ ^ ^^ ^ ^ ^ ^^ ^ ^ ^
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
^ ^^ ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^ ^^ ^^ ^^ ^^ ^^^ ^ ^^ ^^ ^^ ^^ ^^^ ^^^ ^^ ^^^ ^ ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^ File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^ File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
_check_valid_index_key(key, size)
_check_valid_index_key(key, size) _check_valid_index_key(key, size)_check_valid_index_key(key, size)
_check_valid_index_key(key, size)_check_valid_index_key(key, size)
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexErrorIndexError: : Invalid key: 0 is out of bounds for size 0Invalid key: 0 is out of bounds for size 0
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0
IndexErrorIndexError: : Invalid key: 0 is out of bounds for size 0
Invalid key: 0 is out of bounds for size 0
Selecting observations 0 through -39 from data for training...
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset_train = get_preprocessed_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
return DATASET_PREPROC[dataset_config.dataset](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
dataset = format_data(dataset, tokenizer, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
return self._getitem(key)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
_check_valid_index_key(key, size)
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0
--> Running with torch dist debug set to detail
Selecting observations 0 through -39 from data for training...
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset_train = get_preprocessed_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
return DATASET_PREPROC[dataset_config.dataset](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
dataset = format_data(dataset, tokenizer, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
return self._getitem(key)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
_check_valid_index_key(key, size)
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0
[2023-11-30 14:00:52,287] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 689) of binary: /usr/local/bin/python3
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.11/site-packages/torch/distributed/run.py", line 810, in <module>
main()
File "/usr/local/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/usr/local/lib/python3.11/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/usr/local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
llama_recipes/llama_finetuning.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2023-11-30_14:00:52
host : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 690)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2023-11-30_14:00:52
host : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 691)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2023-11-30_14:00:52
host : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 692)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2023-11-30_14:00:52
host : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank : 4 (local_rank: 4)
exitcode : 1 (pid: 693)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2023-11-30_14:00:52
host : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 694)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
time : 2023-11-30_14:00:52
host : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank : 6 (local_rank: 6)
exitcode : 1 (pid: 695)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
time : 2023-11-30_14:00:52
host : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank : 7 (local_rank: 7)
exitcode : 1 (pid: 696)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-11-30_14:00:52
host : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 689)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/cog/server/worker.py", line 217, in _predict
result = predict(**payload)
^^^^^^^^^^^^^^^^^^
File "/src/train.py", line 230, in train
raise Exception(
Exception: Training failed with exit code 1! Check logs for details
@deepakkumar07-debug you are trying to access an element from a dataset that has no elements (size 0). This can happen due to various reasons, like empty data, incorrect pre-processing, or configuration issues.
dataset has enough data let say 20 objects. what are you trying to say incorrect pre-processing, or configuration issues...
dataset has enough data let say 20 objects. what are you trying to say incorrect pre-processing, or configuration issues...
@deepakkumar07-debug Were you able to solve this issue please, I am currently stuck here too. Any solution will be helpful. Thanks a lot
Struggling with the same error. :-(