ValueError: Batch does not contain any data (`None`). At the end of all iterable data available before expected stop iteration.
Hey @loubnabnl,
Thanks for this repo - I've learned a lot from what you implemented here.
I am encountering a strange error when I attempt to use the command:
python santacoder-finetuning/train.py \
--model_path="bigcode/santacoder" \
--dataset_name="json" \
--subset="./mydataset/" \
--data_column "content" \
--split="train" \
--seq_length 2048 \
--max_steps 1000 \
--batch_size 2 \
--gradient_accumulation_steps 4 \
--learning_rate 5e-5 \
--num_warmup_steps 100 \
--eval_freq 100 \
--save_freq 100 \
--log_freq 1 \
--no_fp16 \
--fim_rate 0.5 \
--fim_spm_rate 0.5
If I run this - I end up getting an error with that says:
ValueError: Batch does not contain any data (`None`). At the end of all iterable data available before expected stop iteration.
I hit this error when I pass through the 0.1 mark of the epoch:
{'loss': 0.2896, 'learning_rate': 4.9e-05, 'epoch': 0.1}
{'loss': 0.2095, 'learning_rate': 4.9500000000000004e-05, 'epoch': 0.1}
{'loss': 0.291, 'learning_rate': 5e-05, 'epoch': 0.1}
Traceback (most recent call last):
File "../santacoder-finetuning/train.py", line 289, in <module>
File "../santacoder-finetuning/train.py", line 279, in main
run_training(args, train_dataset, eval_dataset)
File "../santacoder-finetuning/train.py", line 268, in run_training
trainer.train()
File " /lib/python3.10/site-packages/transformers/trainer.py", line 1556, in train
return inner_training_loop(
File " /lib/python3.10/site-packages/transformers/trainer.py", line 1930, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File " /lib/python3.10/site-packages/transformers/trainer.py", line 2257, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File " /lib/python3.10/site-packages/transformers/trainer.py", line 2982, in evaluate
output = eval_loop(
File " /lib/python3.10/site-packages/transformers/trainer.py", line 3161, in evaluation_loop
for step, inputs in enumerate(dataloader):
File " /lib/python3.10/site-packages/accelerate/data_loader.py", line 582, in __iter__
raise ValueError(
ValueError: Batch does not contain any data (`None`). At the end of all iterable data available before expected stop iteration.
My dataset train and test is small and looks like this:
Size of the train set: 295. Size of the validation set: 2
74%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 295/400 [00:00<00:00, 452.00it/s]
The character to token ratio of the dataset is: 3.90
Do you have any thoughts here what I need to do to adjust the training loop? Is it because my train set is too small?
Thanks! Adam
Thanks @muellerzr for your quick response... I tried the following modifications below:
--data_column "content" \
--split="train" \
--seq_length 2048 \
--max_steps 100 \
--batch_size 2 \
--gradient_accumulation_steps 4 \
--learning_rate 5e-5 \
--num_warmup_steps 10 \
--eval_freq 10 \
--save_freq 10 \
--log_freq 1 \
--bf16 \
--fim_rate 0.5 \
--fim_spm_rate 0.5
I kept batch_size=2, but when I decreased the max_steps=100 it still fails with the same error. Is that what you meant, lower the max_steps number to a lower value than total train samples?
Well.... I just ran it, and it looks like it went fine for this dataset. It is very strange. I don't understand why it went just fine for this dataset and not mine.
--model_path="bigcode/santacoder" \
--dataset_name="bigcode/the-stack-dedup" \
--subset="data/python" \
--output_dir "./checkpoints/santacoder-the-stack-dedup-python-debug-298-samples" \
--data_column "content" \
--split="train[0:298]" \
--seq_length 2048 \
--max_steps 1000 \
--batch_size 2 \
--gradient_accumulation_steps 8 \
--learning_rate 5e-5 \
--num_warmup_steps 10 \
--eval_freq 100 \
--save_freq 100 \
--log_freq 1 \
--bf16 \
--fim_rate 0.5 \
--fim_spm_rate 0.5
do you have any words of wisdom on this? is there any way to unit test this piece with my data so i can understand the root cause. I do not understand why the batch is None when it gets to your updated code adjustment.