Saman28Khan

Results 4 comments of Saman28Khan

I’m running docker on windows to use gptq model, response is slow though it is using 12GB GPU, what can be the reason, how to handle it ? Google colab...

!pip install --upgrade tensorrt !git clone https://github.com/PromtEngineer/localGPT.git %cd localGPT !pip install -r requirements.txt !python ingest.py --device_type cuda !python run_localGPT.py --device_type cuda

In constants.py file, change MODEL_ID to TheBloke/Llama-2-7b-Chat-GPTQ And MODEL_BASENAME to model.safetensors

Did you find solution for this? I'm also facing the same issue