transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Training Loop Error

Open vrunm opened this issue 2 years ago • 5 comments

System Info

transformers` version: 4.27.2

  • Platform: Linux-5.15.89+-x86_64-with-debian-bullseye-sid

  • Python version: 3.7.12

  • Huggingface_hub version: 0.12.1

  • PyTorch version (GPU?): 1.13.0+cpu (False)

  • Tensorflow version (GPU?): 2.11.0 (False)

  • Flax version (CPU?/GPU?/TPU?): 0.6.4 (cpu)

  • Jax version: 0.3.25

  • JaxLib version: 0.3.25

  • Using GPU in script?:

  • Using distributed or parallel set-up in script?:

Who can help?

@amyeroberts

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/conda/lib/python3.7/site-packages/transformers/utils/import_utils.py:1110 in _get_module    │
│                                                                                                  │
│   1107 │   │   │   │   result.append(attr)                                                       │
│   1108 │   │   return result                                                                     │
│   1109 │                                                                                         │
│ ❱ 1110 │   def __getattr__(self, name: str) -> Any:                                              │
│   1111 │   │   if name in self._objects:                                                         │
│   1112 │   │   │   return self._objects[name]                                                    │
│   1113 │   │   if name in self._modules:                                                         │
│                                                                                                  │
│ /opt/conda/lib/python3.7/importlib/__init__.py:127 in import_module                              │
│                                                                                                  │
│   124 │   │   │   if character != '.':                                                           │
│   125 │   │   │   │   break                                                                      │
│   126 │   │   │   level += 1                                                                     │
│ ❱ 127 │   return _bootstrap._gcd_import(name[level:], package, level)                            │
│   128                                                                                            │
│   129                                                                                            │
│   130 _RELOADING = {}                                                                            │
│ in _gcd_import                                                                                   │
│ in _find_and_load                                                                                │
│ in _find_and_load_unlocked                                                                       │
│ in _load_unlocked                                                                                │
│ in exec_module                                                                                   │
│ in _call_with_frames_removed                                                                     │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/trainer_seq2seq.py:22 in <module>            │
│                                                                                                  │
│    19 from torch.utils.data import Dataset                                                       │
│    20                                                                                            │
│    21 from .deepspeed import is_deepspeed_zero3_enabled                                          │
│ ❱  22 from .trainer import Trainer                                                               │
│    23 from .trainer_utils import PredictionOutput                                                │
│    24 from .utils import logging                                                                 │
│    25                                                                                            │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/trainer.py:73 in <module>                    │
│                                                                                                  │
│     70 from .debug_utils import DebugOption, DebugUnderflowOverflow                              │
│     71 from .deepspeed import deepspeed_init, is_deepspeed_zero3_enabled                         │
│     72 from .dependency_versions_check import dep_version_check                                  │
│ ❱   73 from .modelcard import TrainingSummary                                                    │
│     74 from .modeling_utils import PreTrainedModel, load_sharded_checkpoint, unwrap_model        │
│     75 from .models.auto.modeling_auto import MODEL_FOR_CAUSAL_LM_MAPPING_NAMES, MODEL_MAPPING_  │
│     76 from .optimization import Adafactor, get_scheduler                                        │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/modelcard.py:32 in <module>                  │
│                                                                                                  │
│    29 from huggingface_hub.utils import HFValidationError                                        │
│    30                                                                                            │
│    31 from . import __version__                                                                  │
│ ❱  32 from .models.auto.modeling_auto import (                                                   │
│    33 │   MODEL_FOR_AUDIO_CLASSIFICATION_MAPPING_NAMES,                                          │
│    34 │   MODEL_FOR_CAUSAL_LM_MAPPING_NAMES,                                                     │
│    35 │   MODEL_FOR_CTC_MAPPING_NAMES,                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ImportError: cannot import name 'MODEL_FOR_ZERO_SHOT_IMAGE_CLASSIFICATION_MAPPING_NAMES' from 
'transformers.models.auto.modeling_auto' 
(/opt/conda/lib/python3.7/site-packages/transformers/models/auto/modeling_auto.py)

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>                                                                                      │
│                                                                                                  │
│ ❱  1 from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments                           │
│    2                                                                                             │
│    3 output_dir="lora-flan-t5-xxl"                                                               │
│    4                                                                                             │
│ in _handle_fromlist                                                                              │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/utils/import_utils.py:1100 in __getattr__    │
│                                                                                                  │
│   1097 │   │   self._name = name                                                                 │
│   1098 │   │   self._import_structure = import_structure                                         │
│   1099 │                                                                                         │
│ ❱ 1100 │   # Needed for autocompletion in an IDE                                                 │
│   1101 │   def __dir__(self):                                                                    │
│   1102 │   │   result = super().__dir__()                                                        │
│   1103 │   │   # The elements of self.__all__ that are submodules may or may not be in the dir   │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/utils/import_utils.py:1115 in _get_module    │
│                                                                                                  │
│   1112 │   │   │   return self._objects[name]                                                    │
│   1113 │   │   if name in self._modules:                                                         │
│   1114 │   │   │   value = self._get_module(name)                                                │
│ ❱ 1115 │   │   elif name in self._class_to_module.keys():                                        │
│   1116 │   │   │   module = self._get_module(self._class_to_module[name])                        │
│   1117 │   │   │   value = getattr(module, name)                                                 │
│   1118 │   │   else:                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Failed to import transformers.trainer_seq2seq because of the following error (look up to see its 
traceback):
cannot import name 'MODEL_FOR_ZERO_SHOT_IMAGE_CLASSIFICATION_MAPPING_NAMES' from 
'transformers.models.auto.modeling_auto' 
(/opt/conda/lib/python3.7/site-packages/transformers/models/auto/modeling_auto.py)

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

from transformers import AutoModelForSeq2SeqLM

# huggingface hub model id
model_id = "philschmid/flan-t5-xxl-sharded-fp16"

# load model from the hub
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, device_map="auto")
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, TaskType

# Define LoRA Config
lora_config = LoraConfig(
 r=16,
 lora_alpha=32,
 target_modules=["q", "v"],
 lora_dropout=0.05,
 bias="none",
 task_type=TaskType.SEQ_2_SEQ_LM
)
# prepare int-8 model for training
model = prepare_model_for_int8_training(model)

# add LoRA adaptor
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()



from transformers import DataCollatorForSeq2Seq

# we want to ignore tokenizer pad token in the loss
label_pad_token_id = -100
# Data collator
data_collator = DataCollatorForSeq2Seq(
    tokenizer,
    model=model,
    label_pad_token_id=label_pad_token_id,
    pad_to_multiple_of=8
)

from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

output_dir="lora-flan-t5-xxl"

# Define training args
training_args = Seq2SeqTrainingArguments(
    output_dir=output_dir,
		auto_find_batch_size=True,
    learning_rate=1e-3, # higher learning rate
    num_train_epochs=5,
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=500,
    save_strategy="no",
    report_to="tensorboard",
)

# Create Trainer instance
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=tokenized_dataset["train"],
)
model.config.use_cache = False 

Expected behavior

We train our model FLAN T5 XXL and a training loop starts for 5 epochs.

vrunm avatar Mar 24 '23 10:03 vrunm

I just tried an install on a fresh environment of Transformers v4.27.2 and I cannot reproduce this. Can you maybe retry a fresh install? The constant not found is definitely in that module and it's a basic dict.

sgugger avatar Mar 24 '23 11:03 sgugger

@sgugger I did try a fresh environment but still ran into the same issue.

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>                                                                                      │
│                                                                                                  │
│   4 model_id = "philschmid/flan-t5-xxl-sharded-fp16"                                             │
│   5                                                                                              │
│   6 # load model from the hub                                                                    │
│ ❱ 7 model = AutoModelForSeq2SeqLM.from_pretrained(model_id, device_map="auto")                   │
│   8                                                                                              │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/models/auto/auto_factory.py:472 in           │
│ from_pretrained                                                                                  │
│                                                                                                  │
│   469 │   │   elif type(config) in cls._model_mapping.keys():                                    │
│   470 │   │   │   model_class = _get_model_class(config, cls._model_mapping)                     │
│   471 │   │   │   return model_class.from_pretrained(                                            │
│ ❱ 472 │   │   │   │   pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs,   │
│   473 │   │   │   )                                                                              │
│   474 │   │   raise ValueError(                                                                  │
│   475 │   │   │   f"Unrecognized configuration class {config.__class__} for this kind of AutoM   │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/modeling_utils.py:2662 in from_pretrained    │
│                                                                                                  │
│   2659 │   │   │   │   offload_state_dict=offload_state_dict,                                    │
│   2660 │   │   │   │   dtype=torch_dtype,                                                        │
│   2661 │   │   │   │   load_in_8bit=load_in_8bit,                                                │
│ ❱ 2662 │   │   │   │   keep_in_fp32_modules=keep_in_fp32_modules,                                │
│   2663 │   │   │   )                                                                             │
│   2664 │   │                                                                                     │
│   2665 │   │   model.is_loaded_in_8bit = load_in_8bit                                            │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/transformers/modeling_utils.py:2742 in                    │
│ _load_pretrained_model                                                                           │
│                                                                                                  │
│   2739 │   │   │   is_safetensors = archive_file.endswith(".safetensors")                        │
│   2740 │   │   │   if offload_folder is None and not is_safetensors:                             │
│   2741 │   │   │   │   raise ValueError(                                                         │
│ ❱ 2742 │   │   │   │   │   "The current `device_map` had weights offloaded to the disk. Please   │
│   2743 │   │   │   │   │   " for them. Alternatively, make sure you have `safetensors` installe  │
│   2744 │   │   │   │   │   " offers the weights in this format."                                 │
│   2745 │   │   │   │   )                                                                         │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for 
them. Alternatively, make sure you have `safetensors` installed if the model you are using offers the weights in 
this format.

vrunm avatar Mar 24 '23 11:03 vrunm

This is not the same issue as above. Just follow the error message and provide an offload_folder for your model as you don't have enough GPU and CPU memory to host it. Note that you won't be able to train that large model on your setup.

sgugger avatar Mar 24 '23 11:03 sgugger

@sgugger Thanks I got that. Also how to train large models it that case? Earlier I have also tried smaller models and also used the inference API.

vrunm avatar Mar 24 '23 11:03 vrunm

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 23 '23 15:04 github-actions[bot]

input_ids2 = [] attention_masks2 = []

For every tweet...

for tweet in tweets: # encode_plus will: # (1) Tokenize the sentence. # (2) Prepend the [CLS] token to the start. # (3) Append the [SEP] token to the end. # (4) Map tokens to their IDs. # (5) Pad or truncate the sentence to max_length # (6) Create attention masks for [PAD] tokens. encoded_dict2 = tokenizer2.encode_plus( tweet, # Sentence to encode. add_special_tokens = True, # Add '[CLS]' and '[SEP]' max_length = max_len, # Pad & truncate all sentences. pad_to_max_length = True, return_attention_mask = True, # Construct attn. masks. return_tensors = 'pt', # Return pytorch tensors. )

# Add the encoded sentence to the list.    
input_ids2.append(encoded_dict2['input_ids'])

# And its attention mask (simply differentiates padding from non-padding).
attention_masks2.append(encoded_dict2['attention_mask'])

Convert the lists into tensors.

input_ids2 = torch.cat(input_ids, dim=0) attention_masks2 = torch.cat(attention_masks, dim=0) labels = torch.tensor(labels)

Print sentence 0, now as a list of IDs.

print('Original: ', tweets[0]) print('Token IDs from the mentalBert:', input_ids[0])

31 labels = torch.tensor(labels) │ │ 32 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: cat() received an invalid combination of arguments - got (Tensor, dim=int), but expected one of:

  • (tuple of Tensors tensors, int dim, *, Tensor out)
  • (tuple of Tensors tensors, name dim, *, Tensor out)

Adnankramat avatar Aug 15 '23 10:08 Adnankramat