Tom-Neverwinter
Tom-Neverwinter
call python server.py --auto-devices --chat --sdp-attention --model-menu also see your own log file. what model are you running? wizardlm?
seems solved, model is not made for the system user is trying to run it on. other recommended models from [Aitrepreneur](https://www.youtube.com/@Aitrepreneur) Pygmalion 7B model: [https://huggingface.co/gozfarb/pygmal...](https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbHZKRkZ2Wnc1RWJvWHpJcFhnQ2RlOFdCZ2xwUXxBQ3Jtc0tueTM3RUU5MlJEa1F0MkRMMVJNcm9JcDl4VmhQamtBRW9WUHJLVTVUV3l0RV9aTGVUWERjZXpiVmhJSU5ZYmJ6UXJEOFhoODVTTHVheWhwTGJkQ3VTZ1duSWgwMUdnSGw3TXV5VDVjLTIzeWkxUHJZOA&q=https%3A%2F%2Fhuggingface.co%2Fgozfarb%2Fpygmalion-7b-4bit-128g-cuda&v=jhLHa9-JwDM) WizardLM Github: [https://github.com/nlpxucan/WizardLM](https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqa3Rac3NGQmVWUjFQSExha0xvVk1feFY3VDMzd3xBQ3Jtc0trZXVBdGY1U0xMNHBVdDdfX0lObG1UVmFpMFE2RzBlN1JZZUs2R0lGeTFHMjVfT2U1SEI5cFFXZktjUF9sRERzSnZpb05BLWJ6dDUwZWlhWlkzNnkwTEEtMFFzQTJkQnFncEk3Rk9zbkkzWW5NVEdGOA&q=https%3A%2F%2Fgithub.com%2Fnlpxucan%2FWizardLM&v=SaJ8wyKMBds) WizardLM model:...
no error, seems to have shaved off 4 seconds on initial commit. INFO:Loading wizardLM-7B-HF... WARNING:Auto-assiging --gpu-memory 10 for your GPU to try to prevent out-of-memory errors. You can manually set...
follow up for new commit: INFO:Loading wizardLM-7B-HF... WARNING:Auto-assiging --gpu-memory 10 for your GPU to try to prevent out-of-memory errors. You can manually set other values. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2...
only remaining issue I see: ``` Traceback (most recent call last): File “C:\Users\Tom_N\Desktop\oobabooga-windows\oobabooga-windows\text-generation-webui\server.py”, line 59, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “C:\Users\Tom_N\Desktop\oobabooga-windows\oobabooga-windows\text-generation-webui\modules\models.py”, line 157, in load_model from modules.GPTQ_loader import...
https://www.youtube.com/watch?v=QVVb6Md6huA&t=1s ubuntu https://www.youtube.com/watch?v=O9Y_ZdsuKWQ windows https://github.com/oobabooga/text-generation-webui/issues/354 https://github.com/oobabooga/text-generation-webui/issues/1927 https://github.com/oobabooga/text-generation-webui/issues/1915 https://github.com/oobabooga/text-generation-webui/issues/1856 tie in other similar issues making them easier to close when solved
https://github.com/oobabooga/text-generation-webui/issues/1828 trunk for this item. should answer most questions
https://github.com/abetlen and https://github.com/ggerganov and https://github.com/jllllll/GPTQ-for-LLaMa/commits?author=jllllll in case they are not aware? [pretty sure they know, but just in case as always]
> ```shell > nvidia M40-24G > ``` how much vram is needed for 8 bit and is it running the full model or just the 4 bit version?
just a simple sudo code I had chatgpt make based on the installation instructions `@echo off SETLOCAL EnableDelayedExpansion :: Check if script is run as administrator net session >nul 2>&1...