[Bug] accelerate ignores `TPU`
System Info
latest version. tested via both `pip install -U accelerate` and `pip install git+https://github.com/huggingface/accelerate`
Information
- [ ] My own modified scripts
- [x] The official example scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [ ] My own task or dataset (give details below)
Reproduction
While trying to fine-tune llms via torch/transformers on Kaggle TPU v3-8, getting an error that says accelerate don't count TPUs as device:
Error: RuntimeError: There are currently no available devices found, must be one of 'XPU', 'CUDA', or 'NPU'.
To make sure i also tested GoogleCloudPlatform Example that is a torch TPU fine-tune, got same exact error.
Error thrown on the trainer = SFTTrainer(...). You can see full Traceback of Error in below:
Click to Show Full Error Traceback
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[48], line 4
1 from trl import SFTTrainer
2 from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
----> 4 trainer = SFTTrainer(
5 model=base_model,
6 train_dataset=data,
7 args=TrainingArguments(
8 per_device_train_batch_size=BATCH_SIZE, # This is actually the global batch size for SPMD.
9 num_train_epochs=1,
10 max_steps=-1,
11 output_dir="/output_dir",
12 optim="adafactor",
13 logging_steps=1,
14 dataloader_drop_last = True, # Required for SPMD.
15 fsdp="full_shard",
16 fsdp_config=fsdp_config,
17 ),
18 peft_config=lora_config,
19 dataset_text_field="quote",
20 max_seq_length=max_seq_length,
21 packing=True,
22 )
File /usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:101, in _deprecate_arguments.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
99 message += "\n\n" + custom_message
100 warnings.warn(message, FutureWarning)
--> 101 return f(*args, **kwargs)
File /usr/local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:401, in SFTTrainer.__init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics, peft_config, dataset_text_field, packing, formatting_func, max_seq_length, infinite, num_of_sequences, chars_per_token, dataset_num_proc, dataset_batch_size, neftune_noise_alpha, model_init_kwargs, dataset_kwargs, eval_packing)
395 if tokenizer.padding_side is not None and tokenizer.padding_side != "right":
396 warnings.warn(
397 "You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to "
398 "overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code."
399 )
--> 401 super().__init__(
402 model=model,
403 args=args,
404 data_collator=data_collator,
405 train_dataset=train_dataset,
406 eval_dataset=eval_dataset,
407 tokenizer=tokenizer,
408 model_init=model_init,
409 compute_metrics=compute_metrics,
410 callbacks=callbacks,
411 optimizers=optimizers,
412 preprocess_logits_for_metrics=preprocess_logits_for_metrics,
413 )
415 # Add tags for models that have been loaded with the correct transformers version
416 if hasattr(self.model, "add_model_tags"):
File /usr/local/lib/python3.10/site-packages/transformers/trainer.py:411, in Trainer.__init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics)
408 self.deepspeed = None
409 self.is_in_train = False
--> 411 self.create_accelerator_and_postprocess()
413 # memory metrics - must set up as early as possible
414 self._memory_tracker = TrainerMemoryTracker(self.args.skip_memory_metrics)
File /usr/local/lib/python3.10/site-packages/transformers/trainer.py:4858, in Trainer.create_accelerator_and_postprocess(self)
4855 args.update(accelerator_config)
4857 # create accelerator object
-> 4858 self.accelerator = Accelerator(**args)
4859 # some Trainer classes need to use `gather` instead of `gather_for_metrics`, thus we store a flag
4860 self.gather_function = self.accelerator.gather_for_metrics
File /usr/local/lib/python3.10/site-packages/accelerate/accelerator.py:349, in Accelerator.__init__(self, device_placement, split_batches, mixed_precision, gradient_accumulation_steps, cpu, dataloader_config, deepspeed_plugin, fsdp_plugin, megatron_lm_plugin, rng_types, log_with, project_dir, project_config, gradient_accumulation_plugin, step_scheduler_with_optimizer, kwargs_handlers, dynamo_backend, deepspeed_plugins)
345 raise ValueError(f"FSDP requires PyTorch >= {FSDP_PYTORCH_VERSION}")
347 if fsdp_plugin is None: # init from env variables
348 fsdp_plugin = (
--> 349 FullyShardedDataParallelPlugin() if os.environ.get("ACCELERATE_USE_FSDP", "false") == "true" else None
350 )
351 else:
352 if not isinstance(fsdp_plugin, FullyShardedDataParallelPlugin):
File <string>:21, in __init__(self, sharding_strategy, backward_prefetch, mixed_precision_policy, auto_wrap_policy, cpu_offload, ignored_modules, state_dict_type, state_dict_config, optim_state_dict_config, limit_all_gathers, use_orig_params, param_init_fn, sync_module_states, forward_prefetch, activation_checkpointing, cpu_ram_efficient_loading, transformer_cls_names_to_wrap, min_num_params)
File /usr/local/lib/python3.10/site-packages/accelerate/utils/dataclasses.py:1684, in FullyShardedDataParallelPlugin.__post_init__(self)
1682 device = torch.xpu.current_device()
1683 else:
-> 1684 raise RuntimeError(
1685 "There are currently no available devices found, must be one of 'XPU', 'CUDA', or 'NPU'."
1686 )
1687 # Create a function that will be used to initialize the parameters of the model
1688 # when using `sync_module_states`
1689 self.param_init_fn = lambda x: x.to_empty(device=device, recurse=False)
RuntimeError: There are currently no available devices found, must be one of 'XPU', 'CUDA', or 'NPU'.
Â
Â
upgraded transformers, peft and trl to latest version, but got same error.
Expected behavior
accelerate detect TPU and dont throw error through accelerate/utils/dataclasses.py.
update:
the error RuntimeError: There are currently no available devices found, must be one of 'XPU', 'CUDA', or 'NPU' dosen't thrown on transformers==4.38.2 and llama-3 fine-tune successfully done on TPU VM
but llama-3.1 requires upgraded version of transformers. so it's a deadend.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.