LESS icon indicating copy to clipboard operation
LESS copied to clipboard

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0!

Open Haruka1307 opened this issue 9 months ago • 3 comments

Hi!

I try to do step 2 on device cuda"6" since cuda "0" is in use,so I move the batches and model to cuda"6". I print device of batch and model in obtain_gradients_with_adam function to confirm.

But err occurs as below:

Traceback (most recent call last): File "/home/u2019000171/.conda/envs/less/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/u2019000171/.conda/envs/less/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/u2019000171/cjy/LESS/less/data_selection/get_info.py", line 156, in collect_grads(dataloader, File "/home/u2019000171/cjy/LESS/less/data_selection/collect_grad_reps.py", line 263, in collect_grads vectorized_grads = obtain_gradients_with_adam(model, batch, m, v) File "/home/u2019000171/cjy/LESS/less/data_selection/collect_grad_reps.py", line 121, in obtain_gradients_with_adam loss = model(**batch,).loss File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/peft/peft_model.py", line 1081, in forward return self.base_model( File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 103, in forward return self.model.forward(*args, **kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward outputs = self.model( File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1026, in forward inputs_embeds = self.embed_tokens(input_ids) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/home/u2019000171/.conda/envs/less/lib/python3.10/site-packages/torch/nn/functional.py", line 2233, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) Using Adam gradients cuda:6 cuda:6

I didn't figure out that...

Haruka1307 avatar May 25 '24 15:05 Haruka1307