accelerate accelerate deepspeed stage 3 numprocess=2 : RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

System Info

using accelerate config to generate config. no manual modification.
using deepspeed stage3

use trl  sfttrainer. just pass args to the trainer.

then using accelerate launch

it show this error, why

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[X] My own task or dataset (give details below)

Reproduction

compute_environment: LOCAL_MACHINE debug: false deepspeed_config: gradient_accumulation_steps: 1 gradient_clipping: 1.0 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: true zero3_save_16bit_model: false zero_stage: 3 distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_config: dynamo_backend: TENSORRT machine_rank: 0 main_training_function: main mixed_precision: 'no' num_machines: 1 num_processes: 2 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false

Expected behavior

run normally

Jan 26 '24 07:01 aohan237

Please provide us your code and the full stack trace/error log

Jan 26 '24 07:01 muellerzr

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=nf4_config,
    trust_remote_code=True,
    device_map={"": Accelerator().local_process_index},

)

i found that in your example script, device_map should write like this, but there is no any document to show how this works.

after i make this modification, then run accelerate launch shows:

Traceback (most recent call last): File "model.py.py", line 197, in main() File "model.py.py", line 193, in main Traceback (most recent call last): File "model.py.py", line 197, in trainer.train() File "./venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py", line 280, in train output = super().train(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/transformers/trainer.py", line 1696, in train main() File "model.py.py", line 193, in main return inner_training_loop( trainer.train() ^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/trl/trainer/sft_trainer.py", line 280, in train ^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/transformers/trainer.py", line 2085, in _inner_training_loop output = super().train(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/transformers/trainer.py", line 1696, in train tr_loss_step = self.training_step(model, inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/transformers/trainer.py", line 3158, in training_step return inner_training_loop( ^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/transformers/trainer.py", line 2085, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) self.accelerator.backward(loss) ^^^^^^ File "./venv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1983, in backward ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/transformers/trainer.py", line 3158, in training_step self.deepspeed_engine_wrapped.backward(loss, **kwargs) File "./venv/lib/python3.11/site-packages/accelerate/utils/deepspeed.py", line 167, in backward self.engine.backward(loss, **kwargs) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1936, in backward self.accelerator.backward(loss) File "./venv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1983, in backward self.optimizer.backward(loss, retain_graph=retain_graph) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn self.deepspeed_engine_wrapped.backward(loss, **kwargs) ret_val = func(*args, **kwargs) File "./venv/lib/python3.11/site-packages/accelerate/utils/deepspeed.py", line 167, in backward ^^^^^^^^^^^^^^^^^^^^^ self.engine.backward(loss, **kwargs) File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/stage3.py", line 2093, in backward

File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1936, in backward self.loss_scaler.backward(loss.float(), retain_graph=retain_graph) File "./venv/lib/python3.11/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward scaled_loss.backward(retain_graph=retain_graph) File "./venv/lib/python3.11/site-packages/torch/_tensor.py", line 492, in backward self.optimizer.backward(loss, retain_graph=retain_graph) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs)torch.autograd.backward(

File "./venv/lib/python3.11/site-packages/torch/autograd/init.py", line 251, in backward ^^^^^^^^^^^^^^^^^^^^^ Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/stage3.py", line 2093, in backward

File "./venv/lib/python3.11/site-packages/torch/autograd/function.py", line 288, in apply return user_fn(self, *args) ^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 169, in backward ctx.pre_backward_function(ctx.module) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^ ^self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)^ ^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 445, in _run_before_backward_function File "./venv/lib/python3.11/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward scaled_loss.backward(retain_graph=retain_graph) self.pre_sub_module_backward_function(sub_module) File "./venv/lib/python3.11/site-packages/torch/_tensor.py", line 492, in backward File "./venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^ ^torch.autograd.backward(^ ^^ File "./venv/lib/python3.11/site-packages/torch/autograd/init.py", line 251, in backward ^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 527, in pre_sub_module_backward_function Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "./venv/lib/python3.11/site-packages/torch/autograd/function.py", line 288, in apply param_coordinator.fetch_sub_module(sub_module, forward=False) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn return user_fn(self, *args) ret_val = func(*args, **kwargs) ^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^ ^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 169, in backward ^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ctx.pre_backward_function(ctx.module) return func(*args, **kwargs) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn

          ret_val = func(*args, **kwargs)

^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module ^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 445, in _run_before_backward_function self.__all_gather_params(params_to_fetch, forward) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) self.pre_sub_module_backward_function(sub_module) File "./venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ^^^^^^^^^^^^^^^^^^^^ ^return func(*args, **kwargs)

File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params ^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 527, in pre_sub_module_backward_function self._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights) File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params param_coordinator.fetch_sub_module(sub_module, forward=False) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn handle = partitioned_params[0].all_gather_coalesced(partitioned_params, ret_val = func(*args, **kwargs) ^^^^^ ^ ^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^ File "./venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn return func(*args, **kwargs) ret_val = func(*args, **kwargs) ^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^ ^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module ^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1155, in all_gather_coalesced self.__all_gather_params(params_to_fetch, forward) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ dtype=get_only_unique_item(p.ds_tensor.dtype File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/utils.py", line 942, in get_only_unique_item self._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights) File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params handle = partitioned_params[0].all_gather_coalesced(partitioned_params, raise RuntimeError(f"expected there to be only one unique element in {items}") ^^^^^^^^^^^^^RuntimeError^: ^expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7f59144ca8e0>^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1155, in all_gather_coalesced dtype=get_only_unique_item(p.ds_tensor.dtype ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/utils.py", line 942, in get_only_unique_item raise RuntimeError(f"expected there to be only one unique element in {items}") RuntimeError: expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7f6d50c5f6b0> 0%| | 0/50000 [00:01<?, ?it/s] [2024-01-26 16:25:39,347] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 3839695) of binary: ./venvbin/python3.11 Traceback (most recent call last): File "./venv/bin/accelerate", line 8, in sys.exit(main()) ^^^^^^ File "./venv/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "./venv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 979, in launch_command deepspeed_launcher(args) File "./venv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 695, in deepspeed_launcher distrib_run.run(args) File "./venv/lib/python3.11/site-packages/torch/distributed/run.py", line 797, in run elastic_launch( File "./venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Jan 26 '24 08:01 aohan237

In my scripts, I dont use device map, Accelerate handles it on its own.

What happens if you dont pass the device map arg?

Jan 28 '24 09:01 SuperSecureHuman

if do not use device map ,when it calculate, it will raise tensor not on the same device if using accelerate with deepspeed with numprocess=2

Jan 29 '24 01:01 aohan237

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

this action is abused, it should be renamed to we wont look it up for you and it should be deleted automatically

Feb 26 '24 10:02 aohan237

Hello @aohan237 ,

DeepSpeed isn't compatible with bitsandbytes 4-bit quantization. It is also not compatible with torch compile.

Feb 26 '24 12:02 pacman100

Hello @aohan237 ,

DeepSpeed isn't compatible with bitsandbytes 4-bit quantization. It is also not compatible with torch compile.

@pacman100

thanks for the reply.

lots of artivcles said that deepspeed is a vram saver, like 10 times to lower vram usage, so i think may be use deepspeed should really lower my vram usage

but it is not working.

i readlly read the doc a few times, the doc from deepspeed does not say anything about limit and where it should be used

can you please give me an example or references, where should i read or learn about which deepspeed should be used for?

thanks in advance

Feb 27 '24 01:02 aohan237

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Mar 22 '24 15:03 github-actions[bot]

accelerate accelerate copied to clipboard

accelerate deepspeed stage 3 numprocess=2 : RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

System Info

Information

Tasks

Reproduction

Expected behavior

accelerate
accelerate copied to clipboard