Tom-Neverwinter comments

Results 123 comments of


                                            Tom-Neverwinter

Multiple GPU, horrendous speed

call python server.py --auto-devices --chat --sdp-attention --model-menu also see your own log file. what model are you running? wizardlm?

pytorch cuda Out of memory rtx 4090

seems solved, model is not made for the system user is trying to run it on. other recommended models from [Aitrepreneur](https://www.youtube.com/@Aitrepreneur) Pygmalion 7B model: [https://huggingface.co/gozfarb/pygmal...](https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbHZKRkZ2Wnc1RWJvWHpJcFhnQ2RlOFdCZ2xwUXxBQ3Jtc0tueTM3RUU5MlJEa1F0MkRMMVJNcm9JcDl4VmhQamtBRW9WUHJLVTVUV3l0RV9aTGVUWERjZXpiVmhJSU5ZYmJ6UXJEOFhoODVTTHVheWhwTGJkQ3VTZ1duSWgwMUdnSGw3TXV5VDVjLTIzeWkxUHJZOA&q=https%3A%2F%2Fhuggingface.co%2Fgozfarb%2Fpygmalion-7b-4bit-128g-cuda&v=jhLHa9-JwDM) WizardLM Github: [https://github.com/nlpxucan/WizardLM](https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqa3Rac3NGQmVWUjFQSExha0xvVk1feFY3VDMzd3xBQ3Jtc0trZXVBdGY1U0xMNHBVdDdfX0lObG1UVmFpMFE2RzBlN1JZZUs2R0lGeTFHMjVfT2U1SEI5cFFXZktjUF9sRERzSnZpb05BLWJ6dDUwZWlhWlkzNnkwTEEtMFFzQTJkQnFncEk3Rk9zbkkzWW5NVEdGOA&q=https%3A%2F%2Fgithub.com%2Fnlpxucan%2FWizardLM&v=SaJ8wyKMBds) WizardLM model:...

28 percent faster cuda GPTQ with both act-order and groupsize supported

no error, seems to have shaved off 4 seconds on initial commit. INFO:Loading wizardLM-7B-HF... WARNING:Auto-assiging --gpu-memory 10 for your GPU to try to prevent out-of-memory errors. You can manually set...

28 percent faster cuda GPTQ with both act-order and groupsize supported

follow up for new commit: INFO:Loading wizardLM-7B-HF... WARNING:Auto-assiging --gpu-memory 10 for your GPU to try to prevent out-of-memory errors. You can manually set other values. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2...

28 percent faster cuda GPTQ with both act-order and groupsize supported

only remaining issue I see: ``` Traceback (most recent call last): File “C:\Users\Tom_N\Desktop\oobabooga-windows\oobabooga-windows\text-generation-webui\server.py”, line 59, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “C:\Users\Tom_N\Desktop\oobabooga-windows\oobabooga-windows\text-generation-webui\modules\models.py”, line 157, in load_model from modules.GPTQ_loader import...

mpt-7b-storywriter-4bit-128g model_type?

https://www.youtube.com/watch?v=QVVb6Md6huA&t=1s ubuntu https://www.youtube.com/watch?v=O9Y_ZdsuKWQ windows https://github.com/oobabooga/text-generation-webui/issues/354 https://github.com/oobabooga/text-generation-webui/issues/1927 https://github.com/oobabooga/text-generation-webui/issues/1915 https://github.com/oobabooga/text-generation-webui/issues/1856 tie in other similar issues making them easier to close when solved

Tom-Neverwinter

Multiple GPU, horrendous speed

pytorch cuda Out of memory rtx 4090

28 percent faster cuda GPTQ with both act-order and groupsize supported

28 percent faster cuda GPTQ with both act-order and groupsize supported

28 percent faster cuda GPTQ with both act-order and groupsize supported

mpt-7b-storywriter-4bit-128g model_type?

mpt-7b-storywriter-4bit-128g model_type?

mpt-7b-storywriter-4bit-128g model_type?

answer problem with VICUNA 13b 4bit model

[feature request] Implement Windows .bat Launcher for Devika