`AcceleratorState` object has no attribute `distributed_type`.
System Info
accelerate-0.30.1
Google Colab
numpy-1.25.2
torch-2.2.1+cu121
Python 3.10.12
Regarding the accelerate configuration, I am using trainer which employs accelerate inside it, and I do not touch the configuration.
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [X] My own task or dataset (give details below)
Reproduction
The args.json file employed below is available to download at: https://drive.google.com/file/d/1H2MstSq_oz7Xv7spMZCppf39fHGs5rW0/view?usp=drive_link.
The dataset specified in the args.json is the file: https://drive.google.com/file/d/18OVilNSqQQogSMiepe87vtNmzYpCalCs/view?usp=drive_link
In Google Colab, I coded:
!git clone https://github.com/evelinamorim/Seq2seqCoref.git
!pip install -U transformers accelerate
import sys
sys.path.insert(1, "Seq2seqCoref")
from transformers import HfArgumentParser, set_seed
from transformers import AutoModelForSeq2SeqLM, \
DataCollatorForSeq2Seq, AutoConfig, AutoTokenizer
from transformers.integrations import TensorBoardCallback
from arguments import DataArguments, ModelArguments, CorefTrainingArguments \
as TrainingArguments
from constants import SPEAKER_START, SPEAKER_END, MENTION_START, MENTION_END, \
COPY, CLUSTER_NEW, CLUSTERS, SENTENCE_START, SENTENCE_END, SPECIAL_IDS, \
NON_INT_SPECIAL_IDS, MARK_SPECIAL_IDS, MENTION_END_NON_INT_SPECIAL_IDS, \
MENTION_ENDS
from data import CorefDataset
from trainer import CorefTrainer
import os
parser = HfArgumentParser(
(ModelArguments, DataArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_json_file(
json_file=os.path.abspath("args.json"))
set_seed(training_args.seed)
# tokenizer setup
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path)
num_new_tokens = tokenizer.add_tokens([SPEAKER_START, SPEAKER_END,
MENTION_START, MENTION_END,
COPY])
num_new_tokens += tokenizer.add_tokens([SENTENCE_START, SENTENCE_END])
# loading config and model
config = AutoConfig.from_pretrained(model_args.model_name_or_path)
model = AutoModelForSeq2SeqLM.from_pretrained(
model_args.model_name_or_path, config=config)
# data objects
collator = DataCollatorForSeq2Seq(tokenizer, model=model)
train_set = CorefDataset(tokenizer, data_args, training_args, 'train')
tb_callback = TensorBoardCallback()
trainer = CorefTrainer(
tokenizer=tokenizer,
model=model,
args=training_args,
train_dataset=train_set,
# eval_dataset=dev_set,
data_collator=collator,
callbacks=[tb_callback]
)
trainer.train()
The traceback error is:
AttributeError Traceback (most recent call last)
<ipython-input-16-3435b262f1ae> in <cell line: 1>()
----> 1 trainer.train()
5 frames
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1857 hf_hub_utils.enable_progress_bars()
1858 else:
-> 1859 return inner_training_loop(
1860 args=args,
1861 resume_from_checkpoint=resume_from_checkpoint,
/content/Seq2seqCoref/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
169 self._train_batch_size = batch_size
170 # Data loader and number of training steps
--> 171 train_dataloader = self.get_train_dataloader()
172
173 # Setting up training control variables:
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in get_train_dataloader(self)
877 dataloader_params["prefetch_factor"] = self.args.dataloader_prefetch_factor
878
--> 879 return self.accelerator.prepare(DataLoader(train_dataset, **dataloader_params))
880
881 def _get_eval_sampler(self, eval_dataset: Dataset) -> Optional[torch.utils.data.Sampler]:
/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py in prepare(self, device_placement, *args)
1246 )
1247
-> 1248 if self.distributed_type == DistributedType.DEEPSPEED:
1249 model_count = 0
1250 for obj in args:
/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py in distributed_type(self)
527 @property
528 def distributed_type(self):
--> 529 return self.state.distributed_type
530
531 @property
/usr/local/lib/python3.10/dist-packages/accelerate/state.py in __getattr__(self, name)
1074 # so we just modify the error message
1075 if name in self._known_attrs:
-> 1076 raise AttributeError(
1077 f"`AcceleratorState` object has no attribute `{name}`. "
1078 "This happens if `AcceleratorState._reset_state()` was called and "
AttributeError: `AcceleratorState` object has no attribute `distributed_type`. This happens if `AcceleratorState._reset_state()` was called and an `Accelerator` or `PartialState` was not reinitialized.
Expected behavior
To train the model at the end of the code.
What is CorefTrainer? Does it make an AcceleratorState or PartialState or something? As the error hints at, somewhere along the line the state was reset without then being called again
I am sorry I did not specify CorefTrainer. I am using a custom trainer (you can check in this link ).
This custom trainer is a subclass of the Seq2SeqTrainer. None of the implemented functions in the custom trainer reset AcceleratorState. I went through all the of Seq2SeqTrainer and Trainer, and I was only able to identify the method create_accelerator_and_postprocess that creates the accelerator for a trainer object. I do not know if I must provide some configuration to avoid this error.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Are there any updates on this issue? I too am having the problem, with the same packaging.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi there! Are there any updates? I also met this problem.
@FluorumSoc can you give me a full reproducer? Will be needed to help debug where something is being reset w/o an initialization
@muellerzr Thanks for reply! I met the problem after moving super().init() in trl.sft_trainer.py from line 400+ to line 148, next to init() of SFTTrainer. I did this to give transformers.trainer all the args instead of being modified by SFTConfig. Now I move the super().init() back except the attribute "args", the problem disappears. But if I add "args=args" in the original place (line 400+), all of my args are lost; if I do not, I met another problem: RuntimeError: unscale_() has already been called on this optimizer since the last update(). I had met it earlier, and I read lots of issues about it and changed some code as issues said, but in vain. A few of issues told me it's because I used too little finetune data. I haven't come up with what to do yet. LLM: https://github.com/SCIR-HI/Huatuo-Llama-Med-Chinese/tree/main Using Llama-7b: https://huggingface.co/huggyllama/llama-7b/tree/main
super().__init__( # From trl/sft_trainer.py. This was originally in aroud line 412. After moving, it was in line 148 model=model, # args=args, data_collator=data_collator, train_dataset=train_dataset, eval_dataset=eval_dataset, tokenizer=tokenizer, model_init=model_init, compute_metrics=compute_metrics, callbacks=callbacks, optimizers=optimizers, preprocess_logits_for_metrics=preprocess_logits_for_metrics )
env: accelerate 0.34.0.dev0 transformers 4.45.0.dev0 trl 0.10.1 torch 2.4.0 python 3.10
Sorry I just start to use github and have few skills. What a messy form