localGPT
localGPT copied to clipboard
AssertionError: Torch not compiled with CUDA enabled
I have PyTorch with CUDA enabled:
# Name Version Build Channel
pytorch 2.0.1 py3.11_cuda11.8_cudnn8_0 pytorch
pytorch-cuda 11.8 h24eeafa_5 pytorch
pytorch-mutex 1.0 cuda pytorch
This error message needs improvement. What is the actual problem?
requirements.txt
needs to be updated to include the correct pytorch version?
OS Name Microsoft Windows 10 Pro Version 10.0.19045 Build 19045
If I run python ingest.py --device_type cpu
it works OK, but then when I try to run python run_localGPT.py --device_type cpu
it still fails with AssertionError: Torch not compiled with CUDA enabled
. So the README is incorrect when it says "if you do not have a GPU and want to run this on CPU, now you can do that". :/
I have the exact same problem
@endolith I will have a look at the code and see what is causing this.
I do get this error running on an Apple M1 with PyTorch compiled against MPS and running the script with
python run_localGPT.py --device_type=mps
I fixed this issue by installing requirements.txt through conda
. I'll summarize the steps I followed below:
Install conda
for your platform here: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
Then I ran the following commands from the YouTube video:
conda create -n localGPT
conda activate localGPT
# Your terminal line should now start with (localGPT)
git clone https://github.com/PromtEngineer/localGPT.git
cd localGPT
pip install -r .\requirements.txt
If it worked, the output of torch.cuda.is_available()
should be True
.
python3
Python 3.11.4 (tags/v3.11.4:d2340ef, Jun 7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
I do get this error running on an Apple M1 with PyTorch compiled against MPS and running the script with
python run_localGPT.py --device_type=mps
Should I create a separate issue for this regarding the Apple M1/M2 MPS support? @PromtEngineer
@endolith I will have a look at the code and see what is causing this.
Here is the traceback:
python run_localGPT.py --device_type=cpu
(localgpt) λ python ingest.py --device_type=cpu
2023-06-19 15:05:22,591 - INFO - ingest.py:107 - Loading documents from D:\Documents\localgpt/SOURCE_DOCUMENTS
2023-06-19 15:05:27,798 - INFO - ingest.py:111 - Loaded 1 documents from D:\Documents\localgpt/SOURCE_DOCUMENTS
2023-06-19 15:05:27,800 - INFO - ingest.py:112 - Split into 72 chunks of text
2023-06-19 15:05:34,645 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length 512
2023-06-19 15:05:41,533 - INFO - __init__.py:88 - Running Chroma using direct local API.
2023-06-19 15:05:42,666 - WARNING - __init__.py:43 - Using embedded DuckDB with persistence: data will be stored in: D:\Documents\localgpt/DB
2023-06-19 15:05:42,738 - INFO - ctypes.py:22 - Successfully imported ClickHouse Connect C data optimizations
2023-06-19 15:05:42,762 - INFO - json_impl.py:45 - Using python library for writing JSON byte strings
2023-06-19 15:05:42,997 - INFO - duckdb.py:460 - loaded in 288 embeddings
2023-06-19 15:05:42,999 - INFO - duckdb.py:472 - loaded in 1 collections
2023-06-19 15:05:43,003 - INFO - duckdb.py:89 - collection with name langchain already exists, returning existing collection
2023-06-19 15:07:07,517 - INFO - duckdb.py:414 - Persisting DB to disk, putting it in the save folder: D:\Documents\localgpt/DB
2023-06-19 15:07:07,730 - INFO - duckdb.py:414 - Persisting DB to disk, putting it in the save folder: D:\Documents\localgpt/DB
(localgpt) λ python run_localGPT.py --device_type=cpu
2023-06-19 15:10:45,346 - INFO - run_localGPT.py:161 - Running on: cpu
2023-06-19 15:10:45,347 - INFO - run_localGPT.py:162 - Display Source Documents set to: False
2023-06-19 15:10:45,899 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length 512
2023-06-19 15:10:49,728 - INFO - __init__.py:88 - Running Chroma using direct local API.
2023-06-19 15:10:49,763 - WARNING - __init__.py:43 - Using embedded DuckDB with persistence: data will be stored in: D:\Documents\localgpt/DB
2023-06-19 15:10:49,827 - INFO - ctypes.py:22 - Successfully imported ClickHouse Connect C data optimizations
2023-06-19 15:10:49,850 - INFO - json_impl.py:45 - Using python library for writing JSON byte strings
2023-06-19 15:10:49,943 - INFO - duckdb.py:460 - loaded in 360 embeddings
2023-06-19 15:10:49,946 - INFO - duckdb.py:472 - loaded in 1 collections
2023-06-19 15:10:49,948 - INFO - duckdb.py:89 - collection with name langchain already exists, returning existing collection
2023-06-19 15:10:49,949 - INFO - run_localGPT.py:43 - Loading Model: TheBloke/WizardLM-7B-uncensored-GPTQ, on: cpu
2023-06-19 15:10:49,950 - INFO - run_localGPT.py:44 - This action can take a few minutes!
2023-06-19 15:10:49,950 - INFO - run_localGPT.py:49 - Using AutoGPTQForCausalLM for quantized models
2023-06-19 15:10:50,912 - INFO - run_localGPT.py:56 - Tokenizer loaded
2023-06-19 15:10:51,225 - INFO - _base.py:727 - lm_head not been quantized, will be ignored when make_quant.
2023-06-19 15:10:51,230 - WARNING - qlinear_old.py:16 - CUDA extension not installed.
2023-06-19 15:10:52,316 - WARNING - modeling.py:1035 - The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
2023-06-19 15:10:52,331 - WARNING - modeling.py:928 - The safetensors archive passed at C:\Users\endolith/.cache\huggingface\hub\models--TheBloke--WizardLM-7B-uncensored-GPTQ\snapshots\dcb3400039f15cff76b43a4921c59d47c5fc2252\WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
Traceback (most recent call last):
File "D:\Documents\localgpt\run_localGPT.py", line 228, in <module>
main()
File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Documents\localgpt\run_localGPT.py", line 197, in main
llm = load_model(device_type, model_id=model_id, model_basename=model_basename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Documents\localgpt\run_localGPT.py", line 58, in load_model
model = AutoGPTQForCausalLM.from_quantized(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\auto_gptq\modeling\auto.py", line 82, in from_quantized
return quant_func(
^^^^^^^^^^^
File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\auto_gptq\modeling\_base.py", line 773, in from_quantized
accelerate.utils.modeling.load_checkpoint_in_model(
File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\accelerate\utils\modeling.py", line 1094, in load_checkpoint_in_model
checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\accelerate\utils\modeling.py", line 946, in load_state_dict
return safe_load_file(checkpoint_file, device=list(device_map.values())[0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\safetensors\torch.py", line 261, in load_file
result[k] = f.get_tensor(k)
^^^^^^^^^^^^^^^
File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\torch\cuda\__init__.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
2023-06-19 15:10:52,373 - INFO - duckdb.py:414 - Persisting DB to disk, putting it in the save folder: D:\Documents\localgpt/DB
Some updates and partial success on my M1:
"cuda" is hardcoded in https://github.com/PromtEngineer/localGPT/blob/979f912d07d40704d105c92b4f20a6a5b8df0c6a/run_localGPT.py#L63
This probably should take the device_type
as an input. @PromtEngineer
I changed this locally and it starts.
But then run into autocast
issues:
2023-06-19 21:04:34,844 - INFO - run_localGPT.py:108 - Local LLM Loaded
Enter a query: what is the senate?
Traceback (most recent call last):
File "/Users/christianweyer/Sources/localGPT/run_localGPT.py", line 228, in <module>
main()
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/Sources/localGPT/run_localGPT.py", line 206, in main
res = qa(query)
^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 140, in __call__
raise e
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 134, in __call__
self._call(inputs, run_manager=run_manager)
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/retrieval_qa/base.py", line 120, in _call
answer = self.combine_documents_chain.run(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 239, in run
return self(kwargs, callbacks=callbacks)[self.output_keys[0]]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 140, in __call__
raise e
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 134, in __call__
self._call(inputs, run_manager=run_manager)
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/combine_documents/base.py", line 84, in _call
output, extra_return_dict = self.combine_docs(
^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/combine_documents/stuff.py", line 87, in combine_docs
return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/llm.py", line 213, in predict
return self(kwargs, callbacks=callbacks)[self.output_key]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 140, in __call__
raise e
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 134, in __call__
self._call(inputs, run_manager=run_manager)
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/llm.py", line 69, in _call
response = self.generate([inputs], run_manager=run_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/llm.py", line 79, in generate
return self.llm.generate_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/llms/base.py", line 134, in generate_prompt
return self.generate(prompt_strings, stop=stop, callbacks=callbacks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/llms/base.py", line 191, in generate
raise e
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/llms/base.py", line 185, in generate
self._generate(prompts, stop=stop, run_manager=run_manager)
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/llms/base.py", line 436, in _generate
self._call(prompt, stop=stop, run_manager=run_manager)
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/llms/huggingface_pipeline.py", line 168, in _call
response = self.pipeline(prompt)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/pipelines/text_generation.py", line 201, in __call__
return super().__call__(text_inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1120, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1127, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1026, in forward
model_outputs = self._forward(model_inputs, **forward_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/pipelines/text_generation.py", line 263, in _forward
generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/auto_gptq/modeling/_base.py", line 422, in generate
with torch.inference_mode(), torch.amp.autocast(device_type=self.device.type):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 218, in __init__
raise RuntimeError('User specified autocast device_type must be \'cuda\' or \'cpu\'')
RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu'
Trying to chase this one now...
@ChristianWeyer this seems to be a bug, thanks for highlighting it. I am not sure if auto_gptq supports M1/M2. Will need to test that.
@ChristianWeyer this seems to be a bug, thanks for highlighting it. I am not sure if auto_gptq supports M1/M2. Will need to test that.
Seems it does not: https://github.com/PanQiWei/AutoGPTQ/issues/133#issuecomment-1575002893
Which means we cannot use LocalGPT on M1/M2 with quantized models for now. Thanks!
@ChristianWeyer I finally got a M2 and just tested it, that is the case. Need to figure out if there is another way.
BTW @PromtEngineer: the current code checks for CUDA explicitly for full models, which makes it unusable for MPS: https://github.com/PromtEngineer/localGPT/blob/main/run_localGPT.py#L68
@PromtEngineer did you find the error that @endolith mentioned earlier. Even though I have a conda environment, I still get an AssertionError: Torch not compiled with CUDA enabled error when I run python run_localGPT.py --device_type cpu
yeah, same here. conda on windows cpu.
I did have the Torch not complied with CUDA enabled on my Windows 11 with Nvidia RTX4070, using conda. I made sure I was in the conda env then ran: pip install torch===2.0.1+cu118 torchvision===0.15.2 -f https://download.pytorch.org/whl/torch_stable.html
This solved my issue.
@ChristianWeyer I finally got a M2 and just tested it, that is the case. Need to figure out if there is another way.
Do you already have an idea here? Thx!
@OssBozier @mindwellsolutions Are you trying to run it with CUDA, though? My GPU doesn't have enough memory so I'm trying to run it without. Yours might count as a different bug? Needs that particular version added to requirements.txt?
(Actually I guess in my original comment I was trying to run it with CUDA, so maybe my second comment is what should be in a separate bug.)
I was just trying to get it to work at all. I am happy to run it with my GPU as I have enough GPU memory. I did only read your first comment before responding back here. I do see your desire to run the CPU option now in your second post. Not sure how to help with that one.
Just pushed a fix for it. Let me know if there is still the same issue.
BTW @PromtEngineer: the current code checks for CUDA explicitly for full models, which makes it unusable for MPS: https://github.com/PromtEngineer/localGPT/blob/main/run_localGPT.py#L68
For my M2, I get better performance with LlamaForCausalLM
compared to AutoForCausalLM
. That's why I had it that way so it will be using LlamaForCausalLM
for both cpu
and mps
. There was a bug and now it should be resolved.
If I run python run_localGPT.py --device_type=cpu
I no longer get the CUDA error, but I then get repeated TimeoutError: The read operation timed out
while trying to download the vicuña model.
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.
So then I manually downloaded the .bin files and tried to put them in the huggingface hub cache folder, but then I get
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 180355072 bytes.
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 90177536 bytes.
etc.
So I think it's just not going to work. https://github.com/PromtEngineer/localGPT/discussions/186
What worked for me: Windows 10, 3080 RTX, Cuda 12.1, python 3.10, pip 23.1.2, running inside virtual env in Powershell installed everything like in told in the tutorial: pip install -r requirements.txt
--check if cuda is installed, should be visible top right corner
nvidia-smi.exe
--check if torch is installed
pip list | findstr torch --if you see something like torch 2.0.1 torchvision 0.15.2 --then torch probably was compiled without cuda --check with another method python -c 'import torch; print(torch.cuda.is_available())' --if next line is "false" then torch wasn't compiled with cuda
--clean up
pip uninstall torch torchvision torchaudio pip cache purge pip list | findstr torch
--go to this site an get the propper command for your system an cuda installation --https://pytorch.org/get-started/locally/
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 --this loaded and installed torch with cuda
--check again
pip list | findstr torch python -c 'import torch; print(torch.cuda.is_available())' -- output should be "true"
--try to ingest python ingest.py
What worked for me: Windows 10, 3080 RTX, Cuda 12.1, python 3.10, pip 23.1.2, running inside virtual env in Powershell installed everything like in told in the tutorial: pip install -r requirements.txt
--check if cuda is installed, should be visible top right corner
nvidia-smi.exe
--check if torch is installed
pip list | findstr torch --if you see something like torch 2.0.1 torchvision 0.15.2 --then torch probably was compiled without cuda --check with another method python -c 'import torch; print(torch.cuda.is_available())' --if next line is "false" then torch wasn't compiled with cuda
--clean up
pip uninstall torch torchvision torchaudio pip cache purge pip list | findstr torch
--go to this site an get the propper command for your system an cuda installation --https://pytorch.org/get-started/locally/
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 --this loaded and installed torch with cuda
--check again
pip list | findstr torch python -c 'import torch; print(torch.cuda.is_available())' -- output should be "true"
--try to ingest python ingest.py
I tried this and it worked, thanks for defining it step by step!
But now I have the next problem...I was able to ingest the document, but I couldn't run it due to
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 124.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Of the allocated memory 6.95 GiB is allocated by PyTorch, and 289.00 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I tried changing the EMBEDDING_MODEL_NAME to instructor-base, but it seems it's not yet small enough.
I have an RTX3070, so only 8 GB VRAM unfortunately...anyone know what instructor model would work? or something else that might be done?
@adjiap In your case the bottleneck is the RAM you have. You probably want to set the RAM here .
Hi @adjiap! Were you abe to solve the memory problem? Thanks in advance
Hi @adjiap! Were you abe to solve the memory problem? Thanks in advance
I'm rather new in understanding the intricacies of machine models and embedding, but here I get to learn a few stuff :)
In the end I was able to run the project by using the base instructor (using my GPU) during ingest.py, but running localGPT.py using CPU.
This brings me to the next problem of having too little RAM, as the Vicuna-7B takes 30GB load of my 32GB RAM (not GPU VRAM, btw). Though it works, the questions are really slow. I haven't tried PromtEngineer's comment about setting the RAM there, (as I'm not sure yet what the argument actually does), because Vicuna-7B afaik, is 30GB, and if it's limited to something smaller, like 5 GB, would probably not work as intended.
A colleague of mine helped me using his machine with dual RTX2080TI, with 12 GB VRAM each, and he was able to run the ingest.py and run_localGPT.py with no issue, though he did show me that when the runLocalGPT.py was run, both of his GPUs are maintaining a 9 GB load.
tl;dr: Vicuna 7B and the large instructor doesn't work without at least a 20 GB VRAM GPU in total. The Embedding (ingest.py) would still work if I were to use the base instructor, but the actual model execution doesn't.
Thanks @adjiap!! I only have a 4GB GPU. I'm using the CPU to run localGPT.py but the answers are slow as you pointed out. I have 64GB of RAM but I didn't notice any improvement when setting this param .
Thanks, this solved it for me.
there is a nice pip install syntax generator on the pytorch website in order to ensure that you download torch with cuda enabled ... it takes care of diffrent operating systems, vrsions of cuda etc. and generatas the right pip install command and correct torch versoin for you.
In my case, for some reason, i had to force reinstall torch packages with no-cache option:
pip install --force-reinstall --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Trying to pip install
without it, conda gave me the Requirement already satisfied
in cache, so the CUDA compiled torch was not installed.
Then the ingest.py used my GPU, on windows 11.