LLMLingua
LLMLingua copied to clipboard
AssertionError: Torch not compiled with CUDA enabled
Hi I tried to run LLMLingua using dolphin-2.6-phi-2 but I got
AssertionError: Torch not compiled with CUDA enabled
PS C:\Users\DefaultUser> python "C:\Users\Public\Coding\LLMLingua\LLMLingua_test1.py"
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\Public\Coding\LLMLingua\LLMLingua_test1.py", line 12, in <module>
llm_lingua = LocalPromptCompressor()
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 27, in __init__
self.load_model(model_name, device_map, model_config)
File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 57, in load_model
model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\transformers\models\auto\auto_factory.py", line 561, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\transformers\modeling_utils.py", line 3706, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\transformers\modeling_utils.py", line 4116, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\transformers\modeling_utils.py", line 778, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "C:\Program Files\Lib\site-packages\accelerate\utils\modeling.py", line 347, in set_module_tensor_to_device
new_value = value.to(device)
^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\torch\cuda\__init__.py", line 289, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
uninstall your pytorch first : !pip uninstall torch torchvision torchaudio -y
and then reinstall it by going to the pytorch official website to get the command
Facing the same, tried uninstalling and installing the torch but error is the same.
I do not have the nvidia gpu or cuda platform in my PC. is there a chance for me to run it without gpu or cuda platform ?
uninstall your pytorch first : !pip uninstall torch torchvision torchaudio -y
and then reinstall it by going to the pytorch official website to get the command
Thanks I re-installed and it worked, but I encountered another error:
Enter your contexts: Test
Enter your question: What is in the context^
Traceback (most recent call last):
File "C:\Users\Public\Coding\LLMLingua\LLMLingua_test1.py", line 44, in <module>
compressed_prompt = llm_lingua.compress_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 252, in compress_prompt
context = self.iterative_compress_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 761, in iterative_compress_prompt
self_loss, self_past_key_values = self.get_ppl(
^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 105, in get_ppl
response = self.model(
^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DefaultUser\.cache\huggingface\modules\transformers_modules\cognitivecomputations\dolphin-2_6-phi-2\a084bb141f99f67e8ff56a654e29ddd53a0b4d7a\modeling_phi.py", line 960, in forward
hidden_states = self.transformer(input_ids, past_key_values=past_key_values, attention_mask=attention_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DefaultUser\.cache\huggingface\modules\transformers_modules\cognitivecomputations\dolphin-2_6-phi-2\a084bb141f99f67e8ff56a654e29ddd53a0b4d7a\modeling_phi.py", line 919, in forward
hidden_states = self.embd(input_ids)
^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DefaultUser\.cache\huggingface\modules\transformers_modules\cognitivecomputations\dolphin-2_6-phi-2\a084bb141f99f67e8ff56a654e29ddd53a0b4d7a\modeling_phi.py", line 78, in forward
input_ids = input_ids.view(-1, input_shape[-1])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous
I ran it another time and got another error:
Enter your contexts: Test
Enter your question: What is in the context?
Traceback (most recent call last):
File "C:\Users\Public\Coding\LLMLingua\LLMLingua_test1.py", line 44, in <module>
compressed_prompt = llm_lingua.compress_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 252, in compress_prompt
context = self.iterative_compress_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 804, in iterative_compress_prompt
threshold = self.get_estimate_threshold_base_distribution(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 624, in get_estimate_threshold_base_distribution
ppl.sort(descending=not condition_flag)
IndexError: index 0 is out of bounds for dimension 0 with size 0
JiHa, try with a different LLM for the compressor like below:
from llmlingua import PromptCompressor llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"})
You should try the notebook examples first to make sure it runs before writing your own code. Clone the repo first and do the notebook examples.
Avkashhirpara, if you don't have gpu, you change the device_map to "cpu". From the document:
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor( model_name="NousResearch/Llama-2-7b-hf", # Default model device_map="cuda", # Device environment (e.g., 'cuda', 'cpu', 'mps') model_config={}, # Configuration for the Huggingface model open_api_config={}, # Configuration for OpenAI Embedding )
Thank you @samvanity for the clarification; that's correct. Hi @Avkashhirpara, you can switch the kernel environment using different 'device_map' settings by following @samvanity's.
Hi @JiHa-Kim, I think this error might be due to incorrect inputs. Could you provide more context about your case?
JiHa, try with a different LLM for the compressor like below:
from llmlingua import PromptCompressor llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"})
You should try the notebook examples first to make sure it runs before writing your own code. Clone the repo first and do the notebook examples.
Avkashhirpara, if you don't have gpu, you change the device_map to "cpu". From the document:
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor( model_name="NousResearch/Llama-2-7b-hf", # Default model device_map="cuda", # Device environment (e.g., 'cuda', 'cpu', 'mps') model_config={}, # Configuration for the Huggingface model open_api_config={}, # Configuration for OpenAI Embedding )
Thanks @samvanity, it works for me now.
If you are on Macbook, you have to change device_map="mps"
in order to make it work.
llm_lingua = PromptCompressor(device_map="mps")
Source: https://stackoverflow.com/a/60619616/902102