text-generation-webui
text-generation-webui copied to clipboard
Any chance we could get compatibility w/ the Nvidia Maxwell generation of GPUs? I've got a Tesla M40 24G card and I can't run any of the models...
Description
Any chance we could get compatibility w/ the Nvidia Maxwell generation of GPUs? I've got a Tesla M40 24G card and I can't run any of the models...
the tesla P40 (P as in peter) does alright with oobabooga but when i try to use it with --load-in-8bit i get errors about not supporting 8bit mat mul or something. this is interesting because documentation says it does support those features.
I've spent a good bit of two days battling 'this mismatched library' or 'those outdated directions' with my own M40 24GB only to stall here; at least this is a definitive answer, and an issue to watch.
As of my knowledge the m40 has a different type of architecture (Maxwell) and may be missing some features/instructions to run properly. there is someone who seems to have solved it by using a older version of the software. but I am yet to make this run on my m40. The link is here:
https://github.com/danmincu/text-generation-webui-m40
Maybe someone can tell me if you get it working? please let me know as well. I've sped too much time on this subject and feel like it is hard to fix.
That repo is based on a version of GPTQ from March (see here), which AFAIK is incompatible with any newer GPTQ quantizations due to breaking changes made in early April (see here). It will fail to load newer quants with a KeyError exception. I haven't even gotten the latest qwopqwop200/cuda branch to work with newer quants for some reason.
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.
I managed to get the m40 to work with the latest branch at the time of writing this therefore my fork is obsolete now. From a birdseye view these are the steps
a) install 22.04.03 TLS (not desktop) b) remove the nvidia driver(s) and install them with cuda 11.7 e.g.
sudo apt-add-repository -r ppa:graphics-drivers/ppa
sudo apt update
sudo apt remove nvidia*
sudo apt autoremove
#install the following (including the nvidia drivers)
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
sudo sh cuda_11.7.0_515.43.04_linux.run
c) git clone this repo and run start_linux.sh
and choose NOT to use the latest cuda but use 11.8 instead <- this was very important !!!
d) run this to get the proper prompt >>> ./cmd_linux.sh
e) run this to download a sample model >>> python download-model.py anon8231489123/vicuna-13b-GPTQ-4bit-128g
f) run ./start_linux.sh --listen
in the UI choose the right model + wbits + groupsize like the picture to coincide with the sample model
You should be able to get it working at this point!