transformers
transformers copied to clipboard
Trainer predict method throws out of memory error on GPT2 during testing
System Info
transformers version: 4.39.2 python 3.12 platform linux
Who can help?
No response
Information
- [X] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, DataCollatorForLanguageModeling
dataset = load_dataset('xsum') # something like xsum but custom where there's text and summary
# preprocess the data for summarization
def preprocess(data):
data_combined = []
data_combined.append(data['text'] + " TL;DR " + data['summary'])
return tokenizer(data_combined)
def data2equal_size_tokens(data):
# split tokens into equal context size
return tokenized_data_chunks
args = TrainingArguments(
output_dir="gpt2_checkpoints",
per_device_train_batch_size=64,
per_device_eval_batch_size=64,
evaluation_strategy="epoch",
eval_steps=1,
logging_steps=1,
gradient_accumulation_steps=8,
num_train_epochs=100,
weight_decay=0.1,
warmup_steps=1_000,
lr_scheduler_type="cosine",
learning_rate=5e-4,
save_total_limit=3,
overwrite_output_dir=True,
)
trainer = Trainer(
model=gpt2model,
tokenizer=tokenizer,
args=args,
data_collator=data_collator,
train_dataset=tokenized_data_chunks["train"],
eval_dataset=tokenized_data_chunks["valid"],
)
This seems to be somewhat training even though at some point the loss doesn't decrease any further.
After training is completed calling trainer.predict(tokenized_data_chunks['test']) throws the following error:
---------------------------------------------------------------------------
OutOfMemoryError Traceback (most recent call last)
Cell In[63], line 1
----> 1 results = trainer.predict(tokenized_data_ctx_chunks['test'])
File /python3.12/site-packages/transformers/trainer.py:3441, in Trainer.predict(self, test_dataset, ignore_keys, metric_key_prefix)
3438 start_time = time.time()
3440 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3441 output = eval_loop(
3442 test_dataloader, description="Prediction", ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix
3443 )
3444 total_batch_size = self.args.eval_batch_size * self.args.world_size
3445 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:
File /python3.12/site-packages/transformers/trainer.py:3580, in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
3578 logits = self.preprocess_logits_for_metrics(logits, labels)
3579 logits = self.gather_function((logits))
-> 3580 preds_host = logits if preds_host is None else nested_concat(preds_host, logits, padding_index=-100)
3582 if labels is not None:
3583 labels = self.gather_function((labels))
File /python3.12/site-packages/transformers/trainer_pt_utils.py:140, in nested_concat(tensors, new_tensors, padding_index)
138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
--> 140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
141 elif isinstance(tensors, Mapping):
142 return type(tensors)(
143 {k: nested_concat(t, new_tensors[k], padding_index=padding_index) for k, t in tensors.items()}
144 )
File /python3.12/site-packages/transformers/trainer_pt_utils.py:99, in torch_pad_and_concatenate(tensor1, tensor2, padding_index)
96 tensor2 = atleast_1d(tensor2)
98 if len(tensor1.shape) == 1 or tensor1.shape[1] == tensor2.shape[1]:
---> 99 return torch.cat((tensor1, tensor2), dim=0)
101 # Let's figure out the new shape
102 new_shape = (tensor1.shape[0] + tensor2.shape[0], max(tensor1.shape[1], tensor2.shape[1])) + tensor1.shape[2:]
OutOfMemoryError: CUDA out of memory. Tried to allocate 10.74 GiB. GPU 0 has a total capacity of 23.69 GiB of which 2.09 GiB is free. Including non-PyTorch memory, this process has 21.59 GiB memory in use. Of the allocated memory 12.16 GiB is allocated by PyTorch, and 9.12 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Expected behavior
Expected to work without issues.
There's a number of other related issues with this error.
For instance, adding compute_metrics to the trainer produces OOM error during training.
Reducing config.n_ctx, config.n_positions or tokenizer.model_max_length from 1024 to 128 doesn't change anything.
In order to avoid the OOM error during training we can add preprocess_logits_for_metrics which resolves the OOM errors during training but now there's seem to be some kind of stagnation during training and the model doesn't train any more cause all the metrics plateau at some point never pick up.
Adding tokens to the tokenizer prior to tokenizing the data and any training results in errors. For instance, one can add tokens tokenizer.add_special_tokens({'pad_token': '<|pad|>', 'sep_token': '<|sep|>', 'bos_token': '<|startoftext|>'}). Then we can proceed training with trainer.train() but without specifying any compute_metrics in the trainer.
Once training is over we call model.generate and there's an error for not setting max_new_tokens since our context length is 128 instead of the original 1024 of the model.
Besides that there's a warning message stating the following:
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Did I do something wrong in the preprocessing step of the data?
Do we need to add bos_token when preprocessing the data?
Is the format that I'm using for the data preprocessing correct for summarization?
cc @muellerzr might be specific to the predic function running on the whole test set -> bound to run OOM ?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I believe https://github.com/huggingface/transformers/pull/28769 implemented a fix!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.