localGPT icon indicating copy to clipboard operation
localGPT copied to clipboard

AssertionError: Torch not compiled with CUDA enabled

Open endolith opened this issue 1 year ago • 23 comments

I have PyTorch with CUDA enabled:

# Name                    Version                   Build  Channel
pytorch                   2.0.1           py3.11_cuda11.8_cudnn8_0    pytorch
pytorch-cuda              11.8                 h24eeafa_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch

This error message needs improvement. What is the actual problem?

requirements.txt needs to be updated to include the correct pytorch version?

OS Name Microsoft Windows 10 Pro Version 10.0.19045 Build 19045

endolith avatar Jun 16 '23 14:06 endolith

If I run python ingest.py --device_type cpu it works OK, but then when I try to run python run_localGPT.py --device_type cpu it still fails with AssertionError: Torch not compiled with CUDA enabled. So the README is incorrect when it says "if you do not have a GPU and want to run this on CPU, now you can do that". :/

endolith avatar Jun 16 '23 15:06 endolith

I have the exact same problem

ahmed240 avatar Jun 17 '23 17:06 ahmed240

@endolith I will have a look at the code and see what is causing this.

PromtEngineer avatar Jun 18 '23 05:06 PromtEngineer

I do get this error running on an Apple M1 with PyTorch compiled against MPS and running the script with python run_localGPT.py --device_type=mps

ChristianWeyer avatar Jun 18 '23 12:06 ChristianWeyer

I fixed this issue by installing requirements.txt through conda. I'll summarize the steps I followed below:

Install conda for your platform here: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html

Then I ran the following commands from the YouTube video:

conda create -n localGPT
conda activate localGPT
# Your terminal line should now start with (localGPT)
git clone https://github.com/PromtEngineer/localGPT.git
cd localGPT
pip install -r .\requirements.txt

If it worked, the output of torch.cuda.is_available() should be True.

python3
Python 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True

rsutton1 avatar Jun 19 '23 14:06 rsutton1

I do get this error running on an Apple M1 with PyTorch compiled against MPS and running the script with python run_localGPT.py --device_type=mps

Should I create a separate issue for this regarding the Apple M1/M2 MPS support? @PromtEngineer

ChristianWeyer avatar Jun 19 '23 17:06 ChristianWeyer

@endolith I will have a look at the code and see what is causing this.

Here is the traceback:

python run_localGPT.py --device_type=cpu

(localgpt) λ python ingest.py --device_type=cpu
2023-06-19 15:05:22,591 - INFO - ingest.py:107 - Loading documents from D:\Documents\localgpt/SOURCE_DOCUMENTS
2023-06-19 15:05:27,798 - INFO - ingest.py:111 - Loaded 1 documents from D:\Documents\localgpt/SOURCE_DOCUMENTS
2023-06-19 15:05:27,800 - INFO - ingest.py:112 - Split into 72 chunks of text
2023-06-19 15:05:34,645 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length  512
2023-06-19 15:05:41,533 - INFO - __init__.py:88 - Running Chroma using direct local API.
2023-06-19 15:05:42,666 - WARNING - __init__.py:43 - Using embedded DuckDB with persistence: data will be stored in: D:\Documents\localgpt/DB
2023-06-19 15:05:42,738 - INFO - ctypes.py:22 - Successfully imported ClickHouse Connect C data optimizations
2023-06-19 15:05:42,762 - INFO - json_impl.py:45 - Using python library for writing JSON byte strings
2023-06-19 15:05:42,997 - INFO - duckdb.py:460 - loaded in 288 embeddings
2023-06-19 15:05:42,999 - INFO - duckdb.py:472 - loaded in 1 collections
2023-06-19 15:05:43,003 - INFO - duckdb.py:89 - collection with name langchain already exists, returning existing collection
2023-06-19 15:07:07,517 - INFO - duckdb.py:414 - Persisting DB to disk, putting it in the save folder: D:\Documents\localgpt/DB
2023-06-19 15:07:07,730 - INFO - duckdb.py:414 - Persisting DB to disk, putting it in the save folder: D:\Documents\localgpt/DB
(localgpt) λ python run_localGPT.py --device_type=cpu
2023-06-19 15:10:45,346 - INFO - run_localGPT.py:161 - Running on: cpu
2023-06-19 15:10:45,347 - INFO - run_localGPT.py:162 - Display Source Documents set to: False
2023-06-19 15:10:45,899 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length  512
2023-06-19 15:10:49,728 - INFO - __init__.py:88 - Running Chroma using direct local API.
2023-06-19 15:10:49,763 - WARNING - __init__.py:43 - Using embedded DuckDB with persistence: data will be stored in: D:\Documents\localgpt/DB
2023-06-19 15:10:49,827 - INFO - ctypes.py:22 - Successfully imported ClickHouse Connect C data optimizations
2023-06-19 15:10:49,850 - INFO - json_impl.py:45 - Using python library for writing JSON byte strings
2023-06-19 15:10:49,943 - INFO - duckdb.py:460 - loaded in 360 embeddings
2023-06-19 15:10:49,946 - INFO - duckdb.py:472 - loaded in 1 collections
2023-06-19 15:10:49,948 - INFO - duckdb.py:89 - collection with name langchain already exists, returning existing collection
2023-06-19 15:10:49,949 - INFO - run_localGPT.py:43 - Loading Model: TheBloke/WizardLM-7B-uncensored-GPTQ, on: cpu
2023-06-19 15:10:49,950 - INFO - run_localGPT.py:44 - This action can take a few minutes!
2023-06-19 15:10:49,950 - INFO - run_localGPT.py:49 - Using AutoGPTQForCausalLM for quantized models
2023-06-19 15:10:50,912 - INFO - run_localGPT.py:56 - Tokenizer loaded
2023-06-19 15:10:51,225 - INFO - _base.py:727 - lm_head not been quantized, will be ignored when make_quant.
2023-06-19 15:10:51,230 - WARNING - qlinear_old.py:16 - CUDA extension not installed.
2023-06-19 15:10:52,316 - WARNING - modeling.py:1035 - The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
2023-06-19 15:10:52,331 - WARNING - modeling.py:928 - The safetensors archive passed at C:\Users\endolith/.cache\huggingface\hub\models--TheBloke--WizardLM-7B-uncensored-GPTQ\snapshots\dcb3400039f15cff76b43a4921c59d47c5fc2252\WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
Traceback (most recent call last):
  File "D:\Documents\localgpt\run_localGPT.py", line 228, in <module>
    main()
  File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Documents\localgpt\run_localGPT.py", line 197, in main
    llm = load_model(device_type, model_id=model_id, model_basename=model_basename)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Documents\localgpt\run_localGPT.py", line 58, in load_model
    model = AutoGPTQForCausalLM.from_quantized(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\auto_gptq\modeling\auto.py", line 82, in from_quantized
    return quant_func(
           ^^^^^^^^^^^
  File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\auto_gptq\modeling\_base.py", line 773, in from_quantized
    accelerate.utils.modeling.load_checkpoint_in_model(
  File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\accelerate\utils\modeling.py", line 1094, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\accelerate\utils\modeling.py", line 946, in load_state_dict
    return safe_load_file(checkpoint_file, device=list(device_map.values())[0])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\safetensors\torch.py", line 261, in load_file
    result[k] = f.get_tensor(k)
                ^^^^^^^^^^^^^^^
  File "C:\Users\endolith\anaconda3\envs\localgpt\Lib\site-packages\torch\cuda\__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
2023-06-19 15:10:52,373 - INFO - duckdb.py:414 - Persisting DB to disk, putting it in the save folder: D:\Documents\localgpt/DB

endolith avatar Jun 19 '23 19:06 endolith

Some updates and partial success on my M1:

"cuda" is hardcoded in https://github.com/PromtEngineer/localGPT/blob/979f912d07d40704d105c92b4f20a6a5b8df0c6a/run_localGPT.py#L63

This probably should take the device_type as an input. @PromtEngineer

I changed this locally and it starts. But then run into autocast issues:

2023-06-19 21:04:34,844 - INFO - run_localGPT.py:108 - Local LLM Loaded

Enter a query: what is the senate?
Traceback (most recent call last):
  File "/Users/christianweyer/Sources/localGPT/run_localGPT.py", line 228, in <module>
    main()
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/Sources/localGPT/run_localGPT.py", line 206, in main
    res = qa(query)
          ^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 140, in __call__
    raise e
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 134, in __call__
    self._call(inputs, run_manager=run_manager)
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/retrieval_qa/base.py", line 120, in _call
    answer = self.combine_documents_chain.run(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 239, in run
    return self(kwargs, callbacks=callbacks)[self.output_keys[0]]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 140, in __call__
    raise e
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 134, in __call__
    self._call(inputs, run_manager=run_manager)
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/combine_documents/base.py", line 84, in _call
    output, extra_return_dict = self.combine_docs(
                                ^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/combine_documents/stuff.py", line 87, in combine_docs
    return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/llm.py", line 213, in predict
    return self(kwargs, callbacks=callbacks)[self.output_key]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 140, in __call__
    raise e
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/base.py", line 134, in __call__
    self._call(inputs, run_manager=run_manager)
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/llm.py", line 69, in _call
    response = self.generate([inputs], run_manager=run_manager)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/chains/llm.py", line 79, in generate
    return self.llm.generate_prompt(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/llms/base.py", line 134, in generate_prompt
    return self.generate(prompt_strings, stop=stop, callbacks=callbacks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/llms/base.py", line 191, in generate
    raise e
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/llms/base.py", line 185, in generate
    self._generate(prompts, stop=stop, run_manager=run_manager)
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/llms/base.py", line 436, in _generate
    self._call(prompt, stop=stop, run_manager=run_manager)
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/langchain/llms/huggingface_pipeline.py", line 168, in _call
    response = self.pipeline(prompt)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/pipelines/text_generation.py", line 201, in __call__
    return super().__call__(text_inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1120, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1127, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1026, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/transformers/pipelines/text_generation.py", line 263, in _forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/auto_gptq/modeling/_base.py", line 422, in generate
    with torch.inference_mode(), torch.amp.autocast(device_type=self.device.type):
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/christianweyer/miniconda3/envs/localGPT/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 218, in __init__
    raise RuntimeError('User specified autocast device_type must be \'cuda\' or \'cpu\'')
RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu'

Trying to chase this one now...

ChristianWeyer avatar Jun 19 '23 19:06 ChristianWeyer

@ChristianWeyer this seems to be a bug, thanks for highlighting it. I am not sure if auto_gptq supports M1/M2. Will need to test that.

PromtEngineer avatar Jun 20 '23 01:06 PromtEngineer

@ChristianWeyer this seems to be a bug, thanks for highlighting it. I am not sure if auto_gptq supports M1/M2. Will need to test that.

Seems it does not: https://github.com/PanQiWei/AutoGPTQ/issues/133#issuecomment-1575002893

Which means we cannot use LocalGPT on M1/M2 with quantized models for now. Thanks!

ChristianWeyer avatar Jun 20 '23 06:06 ChristianWeyer

@ChristianWeyer I finally got a M2 and just tested it, that is the case. Need to figure out if there is another way.

PromtEngineer avatar Jun 20 '23 06:06 PromtEngineer

BTW @PromtEngineer: the current code checks for CUDA explicitly for full models, which makes it unusable for MPS: https://github.com/PromtEngineer/localGPT/blob/main/run_localGPT.py#L68

ChristianWeyer avatar Jun 20 '23 07:06 ChristianWeyer

@PromtEngineer did you find the error that @endolith mentioned earlier. Even though I have a conda environment, I still get an AssertionError: Torch not compiled with CUDA enabled error when I run python run_localGPT.py --device_type cpu

VTaPo avatar Jun 21 '23 18:06 VTaPo

yeah, same here. conda on windows cpu.

michaelchenjana avatar Jun 22 '23 05:06 michaelchenjana

I did have the Torch not complied with CUDA enabled on my Windows 11 with Nvidia RTX4070, using conda. I made sure I was in the conda env then ran: pip install torch===2.0.1+cu118 torchvision===0.15.2 -f https://download.pytorch.org/whl/torch_stable.html

This solved my issue.

OssBozier avatar Jun 23 '23 03:06 OssBozier

@ChristianWeyer I finally got a M2 and just tested it, that is the case. Need to figure out if there is another way.

Do you already have an idea here? Thx!

ChristianWeyer avatar Jun 25 '23 12:06 ChristianWeyer

@OssBozier @mindwellsolutions Are you trying to run it with CUDA, though? My GPU doesn't have enough memory so I'm trying to run it without. Yours might count as a different bug? Needs that particular version added to requirements.txt?

(Actually I guess in my original comment I was trying to run it with CUDA, so maybe my second comment is what should be in a separate bug.)

endolith avatar Jun 25 '23 20:06 endolith

I was just trying to get it to work at all. I am happy to run it with my GPU as I have enough GPU memory. I did only read your first comment before responding back here. I do see your desire to run the CPU option now in your second post. Not sure how to help with that one.

OssBozier avatar Jun 25 '23 21:06 OssBozier

Just pushed a fix for it. Let me know if there is still the same issue.

BTW @PromtEngineer: the current code checks for CUDA explicitly for full models, which makes it unusable for MPS: https://github.com/PromtEngineer/localGPT/blob/main/run_localGPT.py#L68

For my M2, I get better performance with LlamaForCausalLM compared to AutoForCausalLM. That's why I had it that way so it will be using LlamaForCausalLM for both cpu and mps. There was a bug and now it should be resolved.

PromtEngineer avatar Jun 27 '23 02:06 PromtEngineer

If I run python run_localGPT.py --device_type=cpu I no longer get the CUDA error, but I then get repeated TimeoutError: The read operation timed out while trying to download the vicuña model.

urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.

So then I manually downloaded the .bin files and tried to put them in the huggingface hub cache folder, but then I get

RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 180355072 bytes.
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 90177536 bytes.

etc.

So I think it's just not going to work. https://github.com/PromtEngineer/localGPT/discussions/186

endolith avatar Jun 28 '23 13:06 endolith

What worked for me: Windows 10, 3080 RTX, Cuda 12.1, python 3.10, pip 23.1.2, running inside virtual env in Powershell installed everything like in told in the tutorial: pip install -r requirements.txt

--check if cuda is installed, should be visible top right corner

nvidia-smi.exe

--check if torch is installed

pip list | findstr torch --if you see something like torch 2.0.1 torchvision 0.15.2 --then torch probably was compiled without cuda --check with another method python -c 'import torch; print(torch.cuda.is_available())' --if next line is "false" then torch wasn't compiled with cuda

--clean up

pip uninstall torch torchvision torchaudio pip cache purge pip list | findstr torch

--go to this site an get the propper command for your system an cuda installation --https://pytorch.org/get-started/locally/

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 --this loaded and installed torch with cuda

--check again

pip list | findstr torch python -c 'import torch; print(torch.cuda.is_available())' -- output should be "true"

--try to ingest python ingest.py

Cayyro avatar Jun 29 '23 20:06 Cayyro

What worked for me: Windows 10, 3080 RTX, Cuda 12.1, python 3.10, pip 23.1.2, running inside virtual env in Powershell installed everything like in told in the tutorial: pip install -r requirements.txt

--check if cuda is installed, should be visible top right corner

nvidia-smi.exe

--check if torch is installed

pip list | findstr torch --if you see something like torch 2.0.1 torchvision 0.15.2 --then torch probably was compiled without cuda --check with another method python -c 'import torch; print(torch.cuda.is_available())' --if next line is "false" then torch wasn't compiled with cuda

--clean up

pip uninstall torch torchvision torchaudio pip cache purge pip list | findstr torch

--go to this site an get the propper command for your system an cuda installation --https://pytorch.org/get-started/locally/

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 --this loaded and installed torch with cuda

--check again

pip list | findstr torch python -c 'import torch; print(torch.cuda.is_available())' -- output should be "true"

--try to ingest python ingest.py

I tried this and it worked, thanks for defining it step by step!

But now I have the next problem...I was able to ingest the document, but I couldn't run it due to

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 124.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Of the allocated memory 6.95 GiB is allocated by PyTorch, and 289.00 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried changing the EMBEDDING_MODEL_NAME to instructor-base, but it seems it's not yet small enough.

I have an RTX3070, so only 8 GB VRAM unfortunately...anyone know what instructor model would work? or something else that might be done?

adjiap avatar Jul 01 '23 06:07 adjiap

@adjiap In your case the bottleneck is the RAM you have. You probably want to set the RAM here .

PromtEngineer avatar Jul 01 '23 06:07 PromtEngineer

Hi @adjiap! Were you abe to solve the memory problem? Thanks in advance

laurafbec avatar Jul 16 '23 08:07 laurafbec

Hi @adjiap! Were you abe to solve the memory problem? Thanks in advance

I'm rather new in understanding the intricacies of machine models and embedding, but here I get to learn a few stuff :)

In the end I was able to run the project by using the base instructor (using my GPU) during ingest.py, but running localGPT.py using CPU.

This brings me to the next problem of having too little RAM, as the Vicuna-7B takes 30GB load of my 32GB RAM (not GPU VRAM, btw). Though it works, the questions are really slow. I haven't tried PromtEngineer's comment about setting the RAM there, (as I'm not sure yet what the argument actually does), because Vicuna-7B afaik, is 30GB, and if it's limited to something smaller, like 5 GB, would probably not work as intended.

A colleague of mine helped me using his machine with dual RTX2080TI, with 12 GB VRAM each, and he was able to run the ingest.py and run_localGPT.py with no issue, though he did show me that when the runLocalGPT.py was run, both of his GPUs are maintaining a 9 GB load.

tl;dr: Vicuna 7B and the large instructor doesn't work without at least a 20 GB VRAM GPU in total. The Embedding (ingest.py) would still work if I were to use the base instructor, but the actual model execution doesn't.

adjiap avatar Jul 16 '23 13:07 adjiap

Thanks @adjiap!! I only have a 4GB GPU. I'm using the CPU to run localGPT.py but the answers are slow as you pointed out. I have 64GB of RAM but I didn't notice any improvement when setting this param .

laurafbec avatar Jul 17 '23 08:07 laurafbec

Thanks, this solved it for me.

jamiejk avatar Jul 18 '23 22:07 jamiejk

there is a nice pip install syntax generator on the pytorch website in order to ensure that you download torch with cuda enabled ... it takes care of diffrent operating systems, vrsions of cuda etc. and generatas the right pip install command and correct torch versoin for you.

neural-oracle avatar Jul 28 '23 08:07 neural-oracle

In my case, for some reason, i had to force reinstall torch packages with no-cache option:

pip install --force-reinstall --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Trying to pip install without it, conda gave me the Requirement already satisfied in cache, so the CUDA compiled torch was not installed.

Then the ingest.py used my GPU, on windows 11.

abonfo avatar Sep 03 '23 19:09 abonfo