litgpt
litgpt copied to clipboard
Issue with Dolly Dataloader: `context` key not found!
Bug description
I ran into the following issue while running LoRA fine-tuning.
Stack Trace
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/litgpt/data/base.py", line 80, in __getitem__
example = self.transform(example)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/litgpt/data/dolly.py", line 74, in _transform
item["input"] = item.pop("context")
^^^^^^^^^^^^^^^^^^^
KeyError: 'context'
Command
litgpt finetune_lora checkpoints/EleutherAI/pythia-70m --data Dolly --precision 16-mixed --data.num_workers 4 --train.global_batch_size 1 --train.max_seq_length 512 --data.val_split_fraction 0.0
I spent some time debugging it. It seems like _transform
method is being called twice at the beginning for some reason. During the second call, they keys are not there since we are using pop
. It does work with get
though.
In src/litgpt/litgpt/data/dolly.py
(commented parts are for debugging):
# import sys
# from pprint import pprint
def _transform(idx: int, item: dict) -> dict:
# if "context" not in item.keys():
# print(f"{idx}: Missing Key!")
# pprint(item)
# sys.exit()
item["input"] = item.pop("context")
item["output"] = item.pop("response")
return item
I couldn't figure out why it is being called twice though.
What operating system are you using?
macOS
LitGPT Version
Tested on two versions. Also tested on two platforms macOS
and linux
.
litgpt 0.4.13
litgpt 0.4.14.dev1