accelerate
accelerate copied to clipboard
accelerate deepspeed stage 3 numprocess=2 : RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
System Info
using accelerate config to generate config. no manual modification.
using deepspeed stage3
use trl sfttrainer. just pass args to the trainer.
then using accelerate launch
it show this error, why
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [X] My own task or dataset (give details below)
Reproduction
compute_environment: LOCAL_MACHINE debug: false deepspeed_config: gradient_accumulation_steps: 1 gradient_clipping: 1.0 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: true zero3_save_16bit_model: false zero_stage: 3 distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_config: dynamo_backend: TENSORRT machine_rank: 0 main_training_function: main mixed_precision: 'no' num_machines: 1 num_processes: 2 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
Expected behavior
run normally
Please provide us your code and the full stack trace/error log
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=nf4_config,
trust_remote_code=True,
device_map={"": Accelerator().local_process_index},
)
i found that in your example script, device_map should write like this, but there is no any document to show how this works.
after i make this modification, then run accelerate launch shows:
Traceback (most recent call last):
File "model.py.py", line 197, in
File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1936, in backward self.loss_scaler.backward(loss.float(), retain_graph=retain_graph) File "./venv/lib/python3.11/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward scaled_loss.backward(retain_graph=retain_graph) File "./venv/lib/python3.11/site-packages/torch/_tensor.py", line 492, in backward self.optimizer.backward(loss, retain_graph=retain_graph) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs)torch.autograd.backward(
File "./venv/lib/python3.11/site-packages/torch/autograd/init.py", line 251, in backward ^^^^^^^^^^^^^^^^^^^^^ Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/stage3.py", line 2093, in backward
File "./venv/lib/python3.11/site-packages/torch/autograd/function.py", line 288, in apply return user_fn(self, *args) ^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 169, in backward ctx.pre_backward_function(ctx.module) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^ ^self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)^ ^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 445, in _run_before_backward_function File "./venv/lib/python3.11/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward scaled_loss.backward(retain_graph=retain_graph) self.pre_sub_module_backward_function(sub_module) File "./venv/lib/python3.11/site-packages/torch/_tensor.py", line 492, in backward File "./venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^ ^torch.autograd.backward(^ ^^ File "./venv/lib/python3.11/site-packages/torch/autograd/init.py", line 251, in backward ^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 527, in pre_sub_module_backward_function Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "./venv/lib/python3.11/site-packages/torch/autograd/function.py", line 288, in apply param_coordinator.fetch_sub_module(sub_module, forward=False) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn return user_fn(self, *args) ret_val = func(*args, **kwargs) ^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^ ^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 169, in backward ^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ctx.pre_backward_function(ctx.module) return func(*args, **kwargs) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module ^^^^^^^^^^^^^^^^^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 445, in _run_before_backward_function self.__all_gather_params(params_to_fetch, forward) File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) self.pre_sub_module_backward_function(sub_module) File "./venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ^^^^^^^^^^^^^^^^^^^^ ^return func(*args, **kwargs)
File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params
^^^^^^^^^^^^^^^^^^^^^
File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 527, in pre_sub_module_backward_function
self._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights)
File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params
param_coordinator.fetch_sub_module(sub_module, forward=False)
File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
handle = partitioned_params[0].all_gather_coalesced(partitioned_params,
ret_val = func(*args, **kwargs)
^^^^^ ^ ^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^ File "./venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
^^^^^^^^^^^^^^^^^^^
File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
return func(*args, **kwargs)
ret_val = func(*args, **kwargs)
^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^
^^^^ File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 284, in fetch_sub_module
^^^^^^^^^^^^^^^
File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 1155, in all_gather_coalesced
self.__all_gather_params(params_to_fetch, forward)
File "./venv/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
dtype=get_only_unique_item(p.ds_tensor.dtype
File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 428, in __all_gather_params
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "./venv/lib/python3.11/site-packages/deepspeed/runtime/utils.py", line 942, in get_only_unique_item
self._all_gather_params(nonquantized_params, forward, quantize=self.zero_quantized_weights)
File "./venv/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 446, in _all_gather_params
handle = partitioned_params[0].all_gather_coalesced(partitioned_params,
raise RuntimeError(f"expected there to be only one unique element in {items}")
^^^^^^^^^^^^^RuntimeError^: ^expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param.
In my scripts, I dont use device map, Accelerate handles it on its own.
What happens if you dont pass the device map arg?
if do not use device map ,when it calculate, it will raise tensor not on the same device if using accelerate with deepspeed with numprocess=2
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
this action is abused, it should be renamed to we wont look it up for you and it should be deleted automatically
Hello @aohan237 ,
DeepSpeed isn't compatible with bitsandbytes 4-bit quantization. It is also not compatible with torch compile.
Hello @aohan237 ,
DeepSpeed isn't compatible with bitsandbytes 4-bit quantization. It is also not compatible with torch compile.
@pacman100
thanks for the reply.
lots of artivcles said that deepspeed is a vram saver, like 10 times to lower vram usage, so i think may be use deepspeed should really lower my vram usage
but it is not working.
i readlly read the doc a few times, the doc from deepspeed does not say anything about limit and where it should be used
can you please give me an example or references, where should i read or learn about which deepspeed should be used for?
thanks in advance
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.