examples
examples copied to clipboard
[Bug] Example stops with error when running with GPU (CUDA) "Expected all tensors to be on the same device, but found at least two devices"
Is this a new bug?
- [X] I believe this is a new bug
- [X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
When running the example abstractive-question-answering.ipynb I get the following error in cell 18, calling generate_answer(query)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[38], line 1
----> 1 generate_answer(query)
Cell In[37], line 5, in generate_answer(query)
3 inputs = tokenizer([query], max_length=1024, return_tensors="pt")
4 # use generator to predict output ids
----> 5 ids = generator.generate(inputs["input_ids"], num_beams=2, min_length=20, max_length=40)
6 # use tokenizer to decode the output ids
7 answer = tokenizer.batch_decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/transformers/generation/utils.py:1329, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, **kwargs)
1321 logger.warning(
1322 "A decoder-only architecture is being used, but right-padding was detected! For correct "
1323 "generation results, please set `padding_side='left'` when initializing the tokenizer."
1324 )
1326 if self.config.is_encoder_decoder and "encoder_outputs" not in model_kwargs:
1327 # if model is encoder decoder encoder_outputs are created
1328 # and added to `model_kwargs`
-> 1329 model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
1330 inputs_tensor, model_kwargs, model_input_name
1331 )
1333 # 5. Prepare `input_ids` which will be used for auto-regressive generation
1334 if self.config.is_encoder_decoder:
File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/transformers/generation/utils.py:642, in GenerationMixin._prepare_encoder_decoder_kwargs_for_generation(self, inputs_tensor, model_kwargs, model_input_name)
640 encoder_kwargs["return_dict"] = True
641 encoder_kwargs[model_input_name] = inputs_tensor
--> 642 model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
644 return model_kwargs
File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/transformers/models/bart/modeling_bart.py:811, in BartEncoder.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
808 raise ValueError("You have to specify either input_ids or inputs_embeds")
810 if inputs_embeds is None:
--> 811 inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale
813 embed_pos = self.embed_positions(input)
814 embed_pos = embed_pos.to(inputs_embeds.device)
File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/modules/sparse.py:162, in Embedding.forward(self, input)
161 def forward(self, input: Tensor) -> Tensor:
--> 162 return F.embedding(
163 input, self.weight, self.padding_idx, self.max_norm,
164 self.norm_type, self.scale_grad_by_freq, self.sparse)
File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/functional.py:2210, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2204 # Note [embedding_renorm set_grad_enabled]
2205 # XXX: equivalent to
2206 # with torch.no_grad():
2207 # torch.embedding_renorm_
2208 # remove once script supports set_grad_enabled
2209 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
Expected Behavior
Should run without errors
Steps To Reproduce
On a system with an NVIDIA GPU supported by PyTorch:
- clone the repo
- start jupyter lab
- open and run the notebook
Relevant log output
N/A
Environment
- **OS**: Ubuntu 22.04
- **Language version**: Python 3.11.3
- **Pinecone client version**: 2.2.2
Additional Context
To fix the bug, change the following line in function generate_answer(query):
inputs = tokenizer([query], max_length=1024, return_tensors="pt")
to:
inputs = tokenizer([query], max_length=1024, return_tensors="pt").to(device)