localGPT icon indicating copy to clipboard operation
localGPT copied to clipboard

very slow GPU compared with CPU

Open EISMANN-DEV opened this issue 2 years ago • 15 comments

Hi all ! model is working great ! i am trying to use my 8GB 4060TI with MODEL_ID = "TheBloke/vicuna-7B-v1.5-GPTQ" MODEL_BASENAME = "model.safetensors"

I changed the GPU today, the previous one was old.

But it takes a few minutes to get a result... however I notice now im getting this messages while running the model :

2023-09-06 19:16:07,759 - INFO - _base.py:727 - lm_head not been quantized, will be ignored when make_quant.
2023-09-06 19:16:07,760 - WARNING - qlinear_old.py:16 - **CUDA extension not installed.**
2023-09-06 19:16:12,071 - WARNING - fused_llama_mlp.py:306 - skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.
C:\Users\a\anaconda3\Lib\site-packages\transformers\generation\configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
C:\Users\a\anaconda3\Lib\site-packages\transformers\generation\configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
The model 'LlamaGPTQForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MptForCausalLM', 'MusicgenForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
2023-09-06 19:16:12,184 - INFO - run_localGPT.py:127 - Local LLM Loaded

can someone tell me what is going on ?

EISMANN-DEV avatar Sep 06 '23 17:09 EISMANN-DEV

if you use the nvidia-smi command, what is your vram usage?

LeafmanZ avatar Sep 06 '23 20:09 LeafmanZ

Hi ! while running it :

`Wed Sep 6 22:39:05 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 537.13 Driver Version: 537.13 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 4060 Ti WDDM | 00000000:01:00.0 On | N/A | | 30% 42C P2 46W / 160W | 7870MiB / 8188MiB | 98% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 2884 C+G ...72.0_x64__8wekyb3d8bbwe\GameBar.exe N/A | | 0 N/A N/A 3512 C+G ...__8wekyb3d8bbwe\WindowsTerminal.exe N/A | | 0 N/A N/A 7636 C+G ...siveControlPanel\SystemSettings.exe N/A | | 0 N/A N/A 8560 C ...\anaconda3\envs\localGPT\python.exe N/A | | 0 N/A N/A 9952 C+G C:\Windows\explorer.exe N/A | | 0 N/A N/A 10800 C+G ...2txyewy\StartMenuExperienceHost.exe N/A | | 0 N/A N/A 11008 C+G ...les (x86)\Battle.net\Battle.net.exe N/A | | 0 N/A N/A 12016 C+G ...les\Microsoft OneDrive\OneDrive.exe N/A | | 0 N/A N/A 12176 C+G ...t.LockApp_cw5n1h2txyewy\LockApp.exe N/A | | 0 N/A N/A 13348 C+G ...GeForce Experience\NVIDIA Share.exe N/A | | 0 N/A N/A 13636 C+G ...air\Corsair iCUE5 Software\iCUE.exe N/A | | 0 N/A N/A 14600 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A | | 0 N/A N/A 15100 C+G ...inaries\Win64\EpicGamesLauncher.exe N/A | | 0 N/A N/A 15384 C+G C:\Program Files\NZXT CAM\NZXT CAM.exe N/A | | 0 N/A N/A 15672 C+G ...ne\Binaries\Win64\EpicWebHelper.exe N/A | | 0 N/A N/A 16296 C+G C:\Program Files\NZXT CAM\NZXT CAM.exe N/A | | 0 N/A N/A 18364 C+G ...Programs\Microsoft VS Code\Code.exe N/A | | 0 N/A N/A 19188 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A | | 0 N/A N/A 19720 C+G ...nt.CBS_cw5n1h2txyewy\SearchHost.exe N/A | | 0 N/A N/A 20388 C+G ...41.0_x64__zpdnekdrzrea0\Spotify.exe N/A | | 0 N/A N/A 23468 C+G ...oogle\Chrome\Application\chrome.exe N/A | +---------------------------------------------------------------------------------------+ `

Im pretty sure something is interfering with this card since other computer at my work run it well with the same specs more or less... they are also giving the cuda message i posted above, but the model is still okay, i can see by task manager the card is being used while giving text.

EISMANN-DEV avatar Sep 06 '23 20:09 EISMANN-DEV

can a oobabooga installation have an effect related to this ?

EISMANN-DEV avatar Sep 06 '23 21:09 EISMANN-DEV

So I managed to fix it, first reinstalled oobabooga with cuda support (I dont know if it influenced localGPT), then completely reinstalled localgpt and its environment.

EDIT : I read somewhere that there is a problem with allocating memory with the new Nvidia drivers, I am now using 537.13 but have to use 532.03 for it to work. There post I read said 531 were safe to use, while my 4060TI has only 532.03 because it was just released after 531.

EISMANN-DEV avatar Sep 07 '23 07:09 EISMANN-DEV

I’m running docker on windows to use gptq model, response is slow though it is using 12GB GPU, what can be the reason, how to handle it ? Google colab uses 12GB GPU and it is fast. Model: Llama 2 7B chat GPTQ

Saman28Khan avatar Sep 10 '23 04:09 Saman28Khan

I’m running docker on windows to use gptq model, response is slow though it is using 12GB GPU, what can be the reason, how to handle it ? Google colab uses 12GB GPU and it is fast. Model: Llama 2 7B chat GPTQ

hi,

have you managed to run this on google colab? Please can you share the details on runtime and the workbook if possible. I am trying to run in colab on T4 GPU with 12GB CPU and 15GB GPU RAM but it keeps crashing after entering the prompt with the following error :

Enter a query: how to elect american president ggml_allocr_alloc: not enough space in the buffer (needed 143278592, largest block available 17334272) GGML_ASSERT: ggml-alloc.c:139: !"not enough space in the buffer"

shishir332 avatar Sep 10 '23 22:09 shishir332

!pip install --upgrade tensorrt !git clone https://github.com/PromtEngineer/localGPT.git %cd localGPT !pip install -r requirements.txt !python ingest.py --device_type cuda !python run_localGPT.py --device_type cuda

Saman28Khan avatar Sep 11 '23 17:09 Saman28Khan

!pip install --upgrade tensorrt !git clone https://github.com/PromtEngineer/localGPT.git %cd localGPT !pip install -r requirements.txt !python ingest.py --device_type cuda !python run_localGPT.py --device_type cuda

Thanks, but that doesn't work anymore on T4 GPU. I tried to upgrade to a better GPU on colab pro but to no avail. 👎

shishir332 avatar Sep 11 '23 18:09 shishir332

In constants.py file, change MODEL_ID to TheBloke/Llama-2-7b-Chat-GPTQ And MODEL_BASENAME to model.safetensors

Saman28Khan avatar Sep 12 '23 03:09 Saman28Khan

So i ditched my RTX 4060 TI and moved to a RTX 4070, 8GB vs 12GB.

I dont get any answer from this model, it just hangs : MODEL_ID = "TheBloke/Llama-2-13B-GPTQ" MODEL_BASENAME = "model.safetensors"

And this model :
MODEL_ID = "TheBloke/vicuna-7B-v1.5-GPTQ" MODEL_BASENAME = "model.safetensors"

just gives a blank answer... does anyone know what is happening ?

EISMANN-DEV avatar Sep 13 '23 15:09 EISMANN-DEV

So I can confirm the models stopped working only because im now using run_localGPT_v2.py, when going back to run_localGPT.py its working again. Something for you @PromtEngineer ? thanks for the effort

EISMANN-DEV avatar Sep 13 '23 15:09 EISMANN-DEV

@N1h1lv5 I hope with this new update, the issue is solved. Can you please confirm?

PromtEngineer avatar Sep 18 '23 07:09 PromtEngineer

@N1h1lv5 I hope with this new update, the issue is solved. Can you please confirm?

The new run_localGPT.py is working, but some models still give empty answers, as you know.

EISMANN-DEV avatar Sep 18 '23 08:09 EISMANN-DEV

I tired this docker file with CUDA 11.7 .. Observing error :

  • NVIDIA Driver on your system is too OLD ---> alternatively go pytorch version

@PromtEngineer - Any suggestion, highly appreciated. Thanks in advance.

WIIN-AI avatar Oct 10 '23 05:10 WIIN-AI

I don't know if anyone has tried it, but if you use GPTQ, there was a warning that says to remove the temperature. So I tried removing it, and everything works great.

run_localGPT.py

Bhavya031 avatar Apr 17 '24 05:04 Bhavya031