transformers BlenderBot-Distil-400M training fails if the input or target length exceeds a certain threshold, even when truncation and padding is on

System Info

transformers version: 4.20.1, 4.21.0 Platform: Linux Python version: 3.7.6 Huggingface_hub version: 0.8.1 PyTorch version (GPU?): 1.10.2 (Yes) Tensorflow version (GPU?): not installed (NA) Flax version (CPU?/GPU?/TPU?): not installed (NA) Jax version: not installed JaxLib version: not installed Using GPU in script?: Yes (2+ Tesla V100) Using distributed or parallel set-up in script?: No

Who can help?

@patil-suraj

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Run the following script with python script_blenderbot_length.py

# The contents of script_blenderbot_length.py
# To make the code crash, set CRITICAL_NUMBER=64
# To make it pass, set CRITICAL_NUMBER=63
# The code fails if EITHER the input or the target is repeated 64+ times.

from __future__ import annotations
import functools
import typing as tp
import datasets
import transformers
from transformers import (
    DataCollatorForSeq2Seq,
    PreTrainedTokenizer,
    Seq2SeqTrainingArguments,
    Seq2SeqTrainer,
)


CRITICAL_NUMBER = 64


increment_en = [
    {"input": "One", "target": "Two"},
    {"input": "Three "*2, "target": "Four "*2},
    {"input": "Five "*4, "target": "Six "*4},
    {"input": "Seven "*8, "target": "Eight "*8},
    {"input": "Nine "*CRITICAL_NUMBER, "target": "Ten "*CRITICAL_NUMBER},
]
increment_en = increment_en * 100


def lod_to_dol(list_of_dicts: tp.List[tp.Dict[str, tp.Any]]) -> tp.Dict[str, list]:
    dict_of_lists = {
        key: [dct[key] for dct in list_of_dicts] for key in list_of_dicts[0]
    }
    return dict_of_lists


increment_en = lod_to_dol(increment_en)


def preprocess_function_(
    examples,
    tokenizer: PreTrainedTokenizer,
    max_input_length: int,
    max_target_length: int,
):
    inputs = examples["input"]
    targets = examples["target"]

    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)

    # Setup the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(targets, max_length=max_target_length, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs


def main():
    tokenizer = transformers.BlenderbotTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
    model = transformers.BlenderbotForConditionalGeneration.from_pretrained("facebook/blenderbot-400M-distill")

    args = Seq2SeqTrainingArguments(
        "script_debug",
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        fp16=True,
        push_to_hub=False,
        max_steps=10000,
        logging_steps=5000,
        save_steps=5000
    )

    data_collator = DataCollatorForSeq2Seq(tokenizer, model=model, padding=True)

    dataset = datasets.DatasetDict(
        {
            "train": datasets.Dataset.from_dict(increment_en),
            "test": datasets.Dataset.from_dict(increment_en),
        }
    )

    preprocess_function = functools.partial(
        preprocess_function_,
        tokenizer=tokenizer,
        max_input_length=512,
        max_target_length=512
    )

    processed_ds = dataset.map(preprocess_function, batched=True)
    processed_ds.set_format(
        type="torch", columns=["input_ids", "attention_mask", "labels"]
    )

    trainer = Seq2SeqTrainer(
        model,
        args,
        train_dataset=processed_ds["train"],
        eval_dataset=processed_ds["test"],
        data_collator=data_collator,
        tokenizer=tokenizer,
    )
    trainer.train()


if __name__ == "__main__":
    main()

Running the code when CRITICAL_NUMBER is set to 64 or greater leads to the bizarre series of CUDA asserts:

<Similar messages appear above, which are omitted for brevity>
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSi
ze` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
  0%|                              | 0/10000 [00:07<?, ?it/s]
root@bolt-imq45r3c3y-8dfzr73qqa:/mnt/task_runtime# python script_blenderbot_length.py 
100%|██████████████████████████| 1/1 [00:00<00:00,  5.30ba/s]
100%|██████████████████████████| 1/1 [00:00<00:00,  5.72ba/s]
max_steps is given, it will override any value given in num_train_epochs
Using cuda_amp half precision backend
The following columns in the training set don't have a corresponding argument in `BlenderbotForConditionalGeneration.forward` and have been ignored: target, input. If target, input are not expected by `BlenderbotForConditionalGeneration.forward`,  you can safely ignore this message.
/miniconda/lib/python3.7/site-packages/transformers/optimization.py:310: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,
***** Running training *****
  Num examples = 500
  Num Epochs = 313
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 10000
  0%|                              | 0/10000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "script_blenderbot_length.py", line 101, in <module>
    main()
  File "script_blenderbot_length.py", line 97, in main
    trainer.train()
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 1502, in train
    ignore_keys_for_eval=ignore_keys_for_eval,
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 1740, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 2470, in training_step
    loss = self.compute_loss(model, inputs)
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 2502, in compute_loss
    outputs = model(**inputs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/miniconda/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 1340, in forward
    return_dict=return_dict,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 1181, in forward
    return_dict=return_dict,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 785, in forward
    output_attentions=output_attentions,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 318, in forward
    output_attentions=output_attentions,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 180, in forward
    query_states = self.q_proj(hidden_states) * self.scaling
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

Expected behavior

The training code should not crash, especially when there are far fewer tokens than the tokenization limit.

Aug 02 '22 18:08 shermansiu

Adding padding=True when tokenizing both the input and targets does not fix the issue.

Aug 02 '22 19:08 shermansiu

Also, when running the script using the CPU only, I get this error:

root@pc:~ # CUDA_VISIBLE_DEVICES="" python script_blenderbot_length.py 
100%|██████████████████████████| 1/1 [00:00<00:00,  4.95ba/s]
100%|██████████████████████████| 1/1 [00:00<00:00,  5.46ba/s]
max_steps is given, it will override any value given in num_train_epochs
The following columns in the training set don't have a corresponding argument in `BlenderbotForConditionalGeneration.forward` and have been ignored: target, input. If target, input are not expected by `BlenderbotForConditionalGeneration.forward`,  you can safely ignore this message.
/miniconda/lib/python3.7/site-packages/transformers/optimization.py:310: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,
***** Running training *****
  Num examples = 500
  Num Epochs = 80
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 1
  Total optimization steps = 10000
  0%|                              | 0/10000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "script_blenderbot_length.py", line 103, in <module>
    main()
  File "script_blenderbot_length.py", line 99, in main
    trainer.train()
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 1502, in train
    ignore_keys_for_eval=ignore_keys_for_eval,
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 1740, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 2470, in training_step
    loss = self.compute_loss(model, inputs)
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 2502, in compute_loss
    outputs = model(**inputs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 1340, in forward
    return_dict=return_dict,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 1181, in forward
    return_dict=return_dict,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 738, in forward
    embed_pos = self.embed_positions(input_shape)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 125, in forward
    return super().forward(positions)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/functional.py", line 2044, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
  0%|                              | 0/10000 [00:00<?, ?it/s]

Aug 02 '22 19:08 shermansiu

I've found out why the error seems to appear. I modified transformers/src/transformers/models/blenderbot/modeling_blenderbot.py:BlenderbotLearnedPositionalEmbedding:forward (approximately near line 125).

        positions = torch.arange(
            past_key_values_length, past_key_values_length + seq_len, dtype=torch.long, device=self.weight.device
        )
+        print(positions)
+        print(self.weight.shape)
        return super().forward(positions)

When running the script, I get this in the output:

tensor([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
         14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,
         28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,
         42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,
         56,  57,  58,  59,  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,
         70,  71,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,  82,  83,
         84,  85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,  97,
         98,  99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
        112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125,
        126, 127, 128, 129])
torch.Size([128, 1280])

Clearly, the positional embeddings are beyond the maximum range available. The question is why... Perhaps this can be configured in the constructor?

Aug 03 '22 18:08 shermansiu

The length of the positions seems to be equal to 2*CRITICAL_NUMBER + 1.

Aug 03 '22 18:08 shermansiu

And... it goes to a maximum of the tokenizer's max_length-1, which is expected, I guess.

Aug 03 '22 18:08 shermansiu

Ah. So the issue is that in the BlenderbotConfig, max_position_embeddings is set to 128. The publicly available weights only have position embeddings with those dimensions, so either I'd have to train from scratch or reduce the max tokenizer length to 128.

Aug 03 '22 18:08 shermansiu

But seriously, this exception should be caught and re-raised with a more human-readable expression.

Aug 03 '22 18:08 shermansiu

(I can contribute a fix after my internship ends, not before)

Aug 03 '22 18:08 shermansiu

Catching and re-raising the exception during GPU training doesn't result in a more human-readable expression (It's still RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle), but at least the flood of CUDA asserts are gone). Getting a more human-readable exception seems to be only possible for CPU-only training.

Aug 03 '22 18:08 shermansiu

cc @sgugger for usage with the Trainer!

Aug 09 '22 07:08 LysandreJik

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sep 02 '22 15:09 github-actions[bot]

transformers transformers copied to clipboard

BlenderBot-Distil-400M training fails if the input or target length exceeds a certain threshold, even when truncation and padding is on

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

transformers
transformers copied to clipboard