gpt4all Run on GPU - can't import GPT4AllGPU

Your instructions on how to run it on GPU are not working for me:

# rungptforallongpu.py

import torch
from transformers import LlamaTokenizer

from nomic.gpt4all import GPT4AllGPU # this fails, copy/pasted that class into this script

LLAMA_PATH = "F:\\GPT4ALLGPU\\llama\\llama-7b-hf"
LLAMA_TOKENIZER_PATH = "F:\\GPT4ALLGPU\\llama\\llama-tokenizer"

tokenizer = LlamaTokenizer.from_pretrained(LLAMA_TOKENIZER_PATH)

m = GPT4AllGPU(LLAMA_PATH)
config = {'num_beams': 2,
      'min_new_tokens': 10,
      'max_length': 100,
      'repetition_penalty': 2.0}
out = m.generate('write me a story about a lonely computer', config)
print(out)

from nomic.gpt4all import GPT4AllGPU fails

ImportError: cannot import name 'GPT4AllGPU' from 'nomic.gpt4all' (F:\GPT4ALLGPU\nomic\nomic\gpt4all\__init__.py)

(I can import the GPT4All class from that file OK, so I know my path is correct). If I copy/paste the GPT4allGPU class into my own python script file that seems to fix that.

Could you suggest a compatible Llama 7B model, and a compatible llama tokenizer pretrained file? It seems to expect both, but I think the random ones I'm using my not be working? Is this like with Stable Diffusion wherea textual inversion has to be trained on that exact model for them to work together, or should any Llama 7B model work with any Llama 7b pretrained tokenizer like I did here?

I tried cloning https://huggingface.co/decapoda-research/llama-7b-hf as my LLAMA_PATH. (And tweaking 2 json files that had it spelled as "LLaMA" and not "Llama")

And cloned https://huggingface.co/HuggingFaceM4/llama-7b-tokenizer/tree/main and set that as my LLAMA_TOKENIZER_PATH

When I run my rungptforallongpu.py script is says: "Loading checkpoint shards" which completes successfully (100% 33/33), but then it fails on a ZeroDivisionError: integer division or modulo by zero on m = GPT4AllGPU(LLAMA_PATH) where it's complaining about Python310\lib\site-packages\peft\peft_model.py:167 in from_pretrained and then site-packages\accelerate\utils\modeling.py get_balanced_memory

 We can't just set the memory to model_size // num_devices as it will end being too  
 slightly less layers and some layers will end up offload at the end. So this funct

So I'm probably doing something wrong, hope someone can tell me what...

Could you in general make this part of the ReadMe instructions a bit clearer please?

The CPU version is running fine via >gpt4all-lora-quantized-win64.exe (but a little slow and the PC fan is going nuts), so I'd like to use my GPU if I can - and then figure out how I can custom train this thing :).

Win11
Torch 2.0.0
CUDA 11.7 (I confirmed that torch can see CUDA)
Python 3.10.10
8GB GeForce 3070
32GB RAM

Apr 01 '23 13:04 bioluminesceme

I'm sorry to hear that you're having trouble running the GPT4AllGPU script on your GPU. Here are some suggestions that may help:

Make sure you have installed the required packages: Before running the GPT4AllGPU script, make sure you have installed the required packages such as torch, transformers, and accelerate. You can use the following command to install them:

pip install torch transformers accelerate Check your GPU configuration: Make sure that your GPU is properly configured and that you have the necessary drivers installed. You can verify this by running the following command:

nvidia-smi This should display information about your GPU, including the driver version.

Use a compatible Llama 7B model and tokenizer: It is important to use a compatible Llama 7B model and tokenizer. You can download the Llama 7B model from Hugging Face using the following command:

bash transformers-cli download decapoda-research/llama-7b-hf You can download the Llama 7B tokenizer from Hugging Face using the following command:

bash

transformers-cli download HuggingFaceM4/llama-7b-tokenizer Try a different batch size: The ZeroDivisionError you're encountering could be caused by an invalid batch size. You may want to experiment with different batch sizes to see if that resolves the issue. You can do this by changing the batch_size parameter in the GPT4AllGPU constructor.

Upgrade your version of PyTorch: The version of PyTorch you are using (2.0.0) is quite old. You may want to upgrade to a newer version to see if that resolves the issue. You can upgrade PyTorch using the following command:

css pip install torch --upgrade I hope this helps! If you continue to experience issues, please let me know and I'll do my best to assist you further.

Apr 01 '23 15:04 AbdelAzizMohamedMousa

Thank you for your answer!

I have all pip packages that are required installed

(I think torch 2.0.0 is actually the latest version, see also https://pytorch.org/ ?) And that's the version of PyTorch that works with the CUDA that my GPU supports so I'd rather not change that.

PyTorch version: 2.0.0+cu117

Can Torch see Cuda? True _CUDA version:

CUDNN version: 8500 Available GPU devices: 1 Device Name: NVIDIA GeForce RTX 3070

I'll try redownloading models/tokenizer with the transformers-cli (but I'll wait for someone to confirm that model/tokenizer should actually work).

Apr 01 '23 16:04 bioluminesceme

Try from nomic.gpt4all.gpt4all import GPT4AllGPU The information in the readme is incorrect I believe.

Apr 01 '23 19:04 loanMaster

Bless your heart @benninkcorien explaining to three eight year olds in a trench coat that just because a response is structured like constructive feedback doesn't actually make it sound advice. Fine tune and validate your results with a third party resource, boys and girls.

@loanmaster, I express appreciation of you for your suggestion to use this line that HTTPSConnectionPool(host='github.com/nomic-ai//gpt4all/issues/159', port=443): Post interrupted.

Apr 01 '23 23:04 winisoft

After @loanMaster help i have this error: It looks like the config file at 'LLAMA_PATH' is not a valid JSON file.

Apr 02 '23 00:04 fzorrilla-ml

@fzorrilla-ml while pretty unlikely help myself, I did leave this tab open, and I can tell you whoever can help you will want to know whether you're trying to run a local model or remote from hf, which one and whether it's one of the proscribed compatible ones, if it's a local whether you altered any of the files in it.... probably about as much as information as you can consider relevant; close to one context in which to consider the error.

Apr 02 '23 02:04 winisoft

I'm sorry to hear that you're having trouble running the GPT4AllGPU script on your GPU. Here are some suggestions that may help:

Make sure you have installed the required packages: Before running the GPT4AllGPU script, make sure you have installed the required packages such as torch, transformers, and accelerate. You can use the following command to install them:

pip install torch transformers accelerate Check your GPU configuration: Make sure that your GPU is properly configured and that you have the necessary drivers installed. You can verify this by running the following command:

nvidia-smi This should display information about your GPU, including the driver version.

Use a compatible Llama 7B model and tokenizer: It is important to use a compatible Llama 7B model and tokenizer. You can download the Llama 7B model from Hugging Face using the following command:

bash transformers-cli download decapoda-research/llama-7b-hf You can download the Llama 7B tokenizer from Hugging Face using the following command:

bash

transformers-cli download HuggingFaceM4/llama-7b-tokenizer Try a different batch size: The ZeroDivisionError you're encountering could be caused by an invalid batch size. You may want to experiment with different batch sizes to see if that resolves the issue. You can do this by changing the batch_size parameter in the GPT4AllGPU constructor.

Upgrade your version of PyTorch: The version of PyTorch you are using (2.0.0) is quite old. You may want to upgrade to a newer version to see if that resolves the issue. You can upgrade PyTorch using the following command:

css pip install torch --upgrade I hope this helps! If you continue to experience issues, please let me know and I'll do my best to assist you further.

Is there any limit on the minimum VRAM required? I mean I run it on GTX 1050 4GB VRAM its also producing that division by zero error.

These guys need to updates their guides on the GPU sample. I've been poking around just to get the correct dependencies and after its all finally connected (not using your steps though since apparently on Windows 10 its not working) it produced that Division by Zero error, haha.

Apr 02 '23 13:04 savire

Thanks @winisoft !

I'm trying to run the example for GPU locally because my current CPU is not supported by the GPT4All binaries.

The reason why my CPU doesn't have support is that GPT4All binaries are compiled with AVX2 instructions and my CPU only has AVX.

Using the guide in Alpaca.cpp and modifying the make configuration file and I was able to compile the chat example so that it only uses AVX.

I'm working with on Ubuntu 22.04 LTS, 16 GB RAM, Intel i7 3770S, NVIDIA GTX 1660 Super with 6GB VRAM.

The model that I am passing as a parameter to GPT4AllGPU is the one provided in the project link.

Apr 02 '23 15:04 fzorrilla-ml

rungptforallongpu.py

@benninkcorien Looks like you're way ahead of me at trying to run this on a GPU. Where did you find this rungptforallongpu.py file? It's not in either the gtp4all or nomic repos.

Apr 04 '23 18:04 mabushey

@benninkcorien Looks like you're way ahead of me at trying to run this on a GPU. Where did you find this rungptforallongpu.py file? It's not in either the gtp4all or nomic repos.

It's in benninkcorien’s first post:

Your instructions on how to run it on GPU are not working for me:

    # rungptforallongpu.py

To replicate, copy the Python code included in benninkcorien’s post and paste it into a file named rungptforallongpu.py

On a related matter, I'm not sure how far we can get, given the unresolvable, hard-coded LoRA path in the nomic code - which, by the looks of it, could be a reference to a subsequently-removed nomic-ai section on huggingface

        self.lora_path = 'nomic-ai/vicuna-lora-multi-turn_epoch_2'

Apr 05 '23 00:04 ghost

Hello, if you are still having problems to import GPT4AllGPU, here's the solution:

There is no reference for the class GPT4ALLGPU on the file nomic/gpt4all/__init__.py

After adding the class, the problem went away.

Just make sure you init file looks like this:

from nomic.gpt4all import GPT4All, GPT4AllGPU, prompt

Apr 05 '23 00:04 CamaradaLares

transformers-cli download decapoda-research/llama-7b-hf

No idea what transformers-cli is.. Does anyone have a URL?

Apr 05 '23 04:04 mabushey

I was able to DL the llama files with:

pacman -Suy git-lfs  ## package install for Arch Linux, like rpm or apt
git lfs install 
git clone https://huggingface.co/decapoda-research/llama-7b-hf
git clone https://huggingface.co/HuggingFaceM4/llama-7b-tokenizer

Apr 05 '23 16:04 mabushey

    self.lora_path = 'nomic-ai/vicuna-lora-multi-turn_epoch_2'

I stuck here also by this missing file...

Apr 06 '23 09:04 netist123

I don't think we can run GPT4all on GPU on Windows at this point. It seems to require the deepspeed package.

And you can not install deepspeed on Windows, because while they say it is partially supported, when you try to build the wheel in the dist folder as per their instructions, that errors out with fatal error LNK1181: cannot open input file 'aio.lib' and that seems to be caused by the aio lib which is not supported on Windows.

I'm giving up. CPU it is.

Apr 06 '23 11:04 bioluminesceme

Do we really need the LLAMA_TOKENIZER_PATH as I can't see it was used in the readme @benninkcorien

Apr 07 '23 22:04 arkilis

Does the model path mentioned in the readme have to be a huggingface model? If so, what does downloading the binary do?

Apr 10 '23 20:04 TahaBinhuraib

Does the model path mentioned in the readme have to be a huggingface model? If so, what does downloading the binary do?

I have the same question

Apr 28 '23 02:04 jasperan

the tokenizer is not used anywhere. the self.lora_path points to nowhere. This is depressing man.

Apr 29 '23 14:04 tmtong

I understand your frustration. It can be difficult to get a new language model working, especially when the instructions are not clear.

I've taken a look at your code and the GPT4AllGPU documentation, and I think I know what's going on. The problem is that you're trying to use a 7B parameter model on a GPU with only 8GB of memory. This is simply not enough memory to run the model.

The GPT4AllGPU documentation states that the model requires at least 12GB of GPU memory. If you want to use the model on a GPU with less memory, you'll need to reduce the model size. You can do this by using the -model_size flag when you download the model. For example, to download a 3B parameter model, you would use the following command: wget https://huggingface.co/decapoda-research/llama-3b-hf -O llama-3b-hf.zip Once you have downloaded the model, you can unzip it and then follow the instructions in the GPT4AllGPU documentation to install it.

Once the model is installed, you should be able to run it on your GPU without any problems.

Here are some additional tips for running GPT4AllGPU on a GPU:

Make sure that your GPU driver is up to date. Use a recent version of Python. Install the latest version of PyTorch. Allocate enough memory for the model. Use a fast SSD to store the model. I hope this helps!

May 01 '23 18:05 AbdelAzizMohamedMousa

The problem is that you're trying to use a 7B parameter model on a GPU with only 8GB of memory.

I have an RTX3090 with 24GB of VRAM, and 128GB of DRAM running Arch Linux.

May 01 '23 18:05 mabushey

There is another bug on offload_folder that says you need to specify it.

Very buggy code

May 04 '23 10:05 tmtong

I'm sorry to hear that you're having trouble running the GPT4AllGPU script on your GPU. Here are some suggestions that may help ... If you continue to experience issues, please let me know and I'll do my best to assist you further.

Ok, just out of curiosity is your response AI generated? No offense intended! I mean it as a complement - it's very long and very polite. Almost too polite... 😄

Jun 01 '23 04:06 harttraveller

Stale, please open a new issue if this still occurs

Aug 10 '23 15:08 niansa

gpt4all gpt4all copied to clipboard

Run on GPU - can't import GPT4AllGPU

rungptforallongpu.py

gpt4all
gpt4all copied to clipboard