gpt4all
gpt4all copied to clipboard
Run on GPU - can't import GPT4AllGPU
Your instructions on how to run it on GPU are not working for me:
# rungptforallongpu.py
import torch
from transformers import LlamaTokenizer
from nomic.gpt4all import GPT4AllGPU # this fails, copy/pasted that class into this script
LLAMA_PATH = "F:\\GPT4ALLGPU\\llama\\llama-7b-hf"
LLAMA_TOKENIZER_PATH = "F:\\GPT4ALLGPU\\llama\\llama-tokenizer"
tokenizer = LlamaTokenizer.from_pretrained(LLAMA_TOKENIZER_PATH)
m = GPT4AllGPU(LLAMA_PATH)
config = {'num_beams': 2,
'min_new_tokens': 10,
'max_length': 100,
'repetition_penalty': 2.0}
out = m.generate('write me a story about a lonely computer', config)
print(out)
from nomic.gpt4all import GPT4AllGPU fails
ImportError: cannot import name 'GPT4AllGPU' from 'nomic.gpt4all' (F:\GPT4ALLGPU\nomic\nomic\gpt4all\__init__.py)
(I can import the GPT4All class from that file OK, so I know my path is correct). If I copy/paste the GPT4allGPU class into my own python script file that seems to fix that.
Could you suggest a compatible Llama 7B model, and a compatible llama tokenizer pretrained file? It seems to expect both, but I think the random ones I'm using my not be working? Is this like with Stable Diffusion wherea textual inversion has to be trained on that exact model for them to work together, or should any Llama 7B model work with any Llama 7b pretrained tokenizer like I did here?
I tried cloning https://huggingface.co/decapoda-research/llama-7b-hf as my LLAMA_PATH. (And tweaking 2 json files that had it spelled as "LLaMA" and not "Llama")
And cloned https://huggingface.co/HuggingFaceM4/llama-7b-tokenizer/tree/main and set that as my LLAMA_TOKENIZER_PATH
When I run my rungptforallongpu.py script is says: "Loading checkpoint shards" which completes successfully (100% 33/33), but then it fails on a ZeroDivisionError: integer division or modulo by zero on m = GPT4AllGPU(LLAMA_PATH) where it's complaining about Python310\lib\site-packages\peft\peft_model.py:167 in from_pretrained and then site-packages\accelerate\utils\modeling.py get_balanced_memory
We can't just set the memory to model_size // num_devices as it will end being too
slightly less layers and some layers will end up offload at the end. So this funct
So I'm probably doing something wrong, hope someone can tell me what...
Could you in general make this part of the ReadMe instructions a bit clearer please?
The CPU version is running fine via >gpt4all-lora-quantized-win64.exe (but a little slow and the PC fan is going nuts), so I'd like to use my GPU if I can - and then figure out how I can custom train this thing :).
- Win11
- Torch 2.0.0
- CUDA 11.7 (I confirmed that torch can see CUDA)
- Python 3.10.10
- 8GB GeForce 3070
- 32GB RAM
I'm sorry to hear that you're having trouble running the GPT4AllGPU script on your GPU. Here are some suggestions that may help:
Make sure you have installed the required packages: Before running the GPT4AllGPU script, make sure you have installed the required packages such as torch, transformers, and accelerate. You can use the following command to install them:
pip install torch transformers accelerate
Check your GPU configuration:
Make sure that your GPU is properly configured and that you have the necessary drivers installed. You can verify this by running the following command:
nvidia-smi
This should display information about your GPU, including the driver version.
Use a compatible Llama 7B model and tokenizer: It is important to use a compatible Llama 7B model and tokenizer. You can download the Llama 7B model from Hugging Face using the following command:
bash
transformers-cli download decapoda-research/llama-7b-hf
You can download the Llama 7B tokenizer from Hugging Face using the following command:
bash
transformers-cli download HuggingFaceM4/llama-7b-tokenizer
Try a different batch size:
The ZeroDivisionError you're encountering could be caused by an invalid batch size. You may want to experiment with different batch sizes to see if that resolves the issue. You can do this by changing the batch_size parameter in the GPT4AllGPU constructor.
Upgrade your version of PyTorch: The version of PyTorch you are using (2.0.0) is quite old. You may want to upgrade to a newer version to see if that resolves the issue. You can upgrade PyTorch using the following command:
css
pip install torch --upgrade
I hope this helps! If you continue to experience issues, please let me know and I'll do my best to assist you further.
Thank you for your answer!
I have all pip packages that are required installed
(I think torch 2.0.0 is actually the latest version, see also https://pytorch.org/ ?) And that's the version of PyTorch that works with the CUDA that my GPU supports so I'd rather not change that.
PyTorch version: 2.0.0+cu117
Can Torch see Cuda? True _CUDA version:
CUDNN version: 8500 Available GPU devices: 1 Device Name: NVIDIA GeForce RTX 3070
I'll try redownloading models/tokenizer with the transformers-cli (but I'll wait for someone to confirm that model/tokenizer should actually work).
Try
from nomic.gpt4all.gpt4all import GPT4AllGPU
The information in the readme is incorrect I believe.
Bless your heart @benninkcorien explaining to three eight year olds in a trench coat that just because a response is structured like constructive feedback doesn't actually make it sound advice. Fine tune and validate your results with a third party resource, boys and girls.
@loanmaster, I express appreciation of you for your suggestion to use this line that HTTPSConnectionPool(host='github.com/nomic-ai//gpt4all/issues/159', port=443): Post interrupted.
After @loanMaster help i have this error: It looks like the config file at 'LLAMA_PATH' is not a valid JSON file.
@fzorrilla-ml while pretty unlikely help myself, I did leave this tab open, and I can tell you whoever can help you will want to know whether you're trying to run a local model or remote from hf, which one and whether it's one of the proscribed compatible ones, if it's a local whether you altered any of the files in it.... probably about as much as information as you can consider relevant; close to one context in which to consider the error.
I'm sorry to hear that you're having trouble running the GPT4AllGPU script on your GPU. Here are some suggestions that may help:
Make sure you have installed the required packages: Before running the GPT4AllGPU script, make sure you have installed the required packages such as torch, transformers, and accelerate. You can use the following command to install them:
pip install torch transformers accelerateCheck your GPU configuration: Make sure that your GPU is properly configured and that you have the necessary drivers installed. You can verify this by running the following command:
nvidia-smiThis should display information about your GPU, including the driver version.Use a compatible Llama 7B model and tokenizer: It is important to use a compatible Llama 7B model and tokenizer. You can download the Llama 7B model from Hugging Face using the following command:
bash
transformers-cli download decapoda-research/llama-7b-hfYou can download the Llama 7B tokenizer from Hugging Face using the following command:bash
transformers-cli download HuggingFaceM4/llama-7b-tokenizerTry a different batch size: The ZeroDivisionError you're encountering could be caused by an invalid batch size. You may want to experiment with different batch sizes to see if that resolves the issue. You can do this by changing the batch_size parameter in the GPT4AllGPU constructor.Upgrade your version of PyTorch: The version of PyTorch you are using (2.0.0) is quite old. You may want to upgrade to a newer version to see if that resolves the issue. You can upgrade PyTorch using the following command:
css
pipinstall torch--upgradeI hope this helps! If you continue to experience issues, please let me know and I'll do my best to assist you further.
Is there any limit on the minimum VRAM required? I mean I run it on GTX 1050 4GB VRAM its also producing that division by zero error.
These guys need to updates their guides on the GPU sample. I've been poking around just to get the correct dependencies and after its all finally connected (not using your steps though since apparently on Windows 10 its not working) it produced that Division by Zero error, haha.
Thanks @winisoft !
I'm trying to run the example for GPU locally because my current CPU is not supported by the GPT4All binaries.
The reason why my CPU doesn't have support is that GPT4All binaries are compiled with AVX2 instructions and my CPU only has AVX.
Using the guide in Alpaca.cpp and modifying the make configuration file and I was able to compile the chat example so that it only uses AVX.
I'm working with on Ubuntu 22.04 LTS, 16 GB RAM, Intel i7 3770S, NVIDIA GTX 1660 Super with 6GB VRAM.
The model that I am passing as a parameter to GPT4AllGPU is the one provided in the project link.
rungptforallongpu.py
@benninkcorien Looks like you're way ahead of me at trying to run this on a GPU. Where did you find this rungptforallongpu.py file? It's not in either the gtp4all or nomic repos.
@benninkcorien Looks like you're way ahead of me at trying to run this on a GPU. Where did you find this
rungptforallongpu.pyfile? It's not in either thegtp4allornomicrepos.
It's in benninkcorien’s first post:
Your instructions on how to run it on GPU are not working for me:
# rungptforallongpu.py
To replicate, copy the Python code included in benninkcorien’s post and paste it into a file named rungptforallongpu.py
On a related matter, I'm not sure how far we can get, given the unresolvable, hard-coded LoRA path in the nomic code - which, by the looks of it, could be a reference to a subsequently-removed nomic-ai section on huggingface
self.lora_path = 'nomic-ai/vicuna-lora-multi-turn_epoch_2'
Hello, if you are still having problems to import GPT4AllGPU, here's the solution:
There is no reference for the class GPT4ALLGPU on the file nomic/gpt4all/__init__.py
After adding the class, the problem went away.
Just make sure you init file looks like this:
from nomic.gpt4all import GPT4All, GPT4AllGPU, prompt
transformers-cli download decapoda-research/llama-7b-hf
No idea what transformers-cli is.. Does anyone have a URL?
I was able to DL the llama files with:
pacman -Suy git-lfs ## package install for Arch Linux, like rpm or apt
git lfs install
git clone https://huggingface.co/decapoda-research/llama-7b-hf
git clone https://huggingface.co/HuggingFaceM4/llama-7b-tokenizer
self.lora_path = 'nomic-ai/vicuna-lora-multi-turn_epoch_2'
I stuck here also by this missing file...
I don't think we can run GPT4all on GPU on Windows at this point. It seems to require the deepspeed package.
And you can not install deepspeed on Windows, because while they say it is partially supported, when you try to build the wheel in the dist folder as per their instructions, that errors out with fatal error LNK1181: cannot open input file 'aio.lib' and that seems to be caused by the aio lib which is not supported on Windows.
I'm giving up. CPU it is.
Do we really need the LLAMA_TOKENIZER_PATH as I can't see it was used in the readme @benninkcorien
Does the model path mentioned in the readme have to be a huggingface model? If so, what does downloading the binary do?
Does the model path mentioned in the readme have to be a huggingface model? If so, what does downloading the binary do?
I have the same question
the tokenizer is not used anywhere. the self.lora_path points to nowhere. This is depressing man.
I understand your frustration. It can be difficult to get a new language model working, especially when the instructions are not clear.
I've taken a look at your code and the GPT4AllGPU documentation, and I think I know what's going on. The problem is that you're trying to use a 7B parameter model on a GPU with only 8GB of memory. This is simply not enough memory to run the model.
The GPT4AllGPU documentation states that the model requires at least 12GB of GPU memory. If you want to use the model on a GPU with less memory, you'll need to reduce the model size. You can do this by using the -model_size flag when you download the model. For example, to download a 3B parameter model, you would use the following command:
wget https://huggingface.co/decapoda-research/llama-3b-hf -O llama-3b-hf.zip
Once you have downloaded the model, you can unzip it and then follow the instructions in the GPT4AllGPU documentation to install it.
Once the model is installed, you should be able to run it on your GPU without any problems.
Here are some additional tips for running GPT4AllGPU on a GPU:
Make sure that your GPU driver is up to date. Use a recent version of Python. Install the latest version of PyTorch. Allocate enough memory for the model. Use a fast SSD to store the model. I hope this helps!
The problem is that you're trying to use a 7B parameter model on a GPU with only 8GB of memory.
I have an RTX3090 with 24GB of VRAM, and 128GB of DRAM running Arch Linux.
There is another bug on offload_folder that says you need to specify it.
Very buggy code
I'm sorry to hear that you're having trouble running the GPT4AllGPU script on your GPU. Here are some suggestions that may help ... If you continue to experience issues, please let me know and I'll do my best to assist you further.
Ok, just out of curiosity is your response AI generated? No offense intended! I mean it as a complement - it's very long and very polite. Almost too polite... 😄
Stale, please open a new issue if this still occurs