llama2-webui issues

Supports accepting network requests, listening on specific ports and running GPTQ models on multiple GPUs

If multiple GPUs are used to run the GPTQ model, memory would only be allocated on the first GPU, resulting in an error due to the inability to allocate more...

Arondight

Gradio Memory Leak Issue

Hi, I experienced a memory leak issue that could probably be connected to Gradio and to the issue discussed here: https://github.com/gradio-app/gradio/issues/3321 In the last messages they write that the issue...

ruizcrp

Very slow generation

1

I am running this on Mac M1 16GB RAM using `app.py` for simple text generation. Using the `llama.cpp` from terminal is much faster but when I use the backend through...

jaslatendresse

why i7 8700 is faster than i7 9700

llama-2-7b-chat.ggmlv3.q4_0 | 4 bit | Intel i7-8700 | 5.4 GB RAM | 6.27 | 173.15 -- | -- | -- | -- | -- | -- llama-2-7b-chat.ggmlv3.q4_0 | 4 bit...

AndreaChiChengdu

The temperature parameter does not seem to work

2

Hi! Code sample first: ``` from llama2_wrapper import LLAMA2_WRAPPER, get_prompt from IPython.display import display, Markdown chat_history = [] llama2_wrapper = LLAMA2_WRAPPER( backend_type="gptq", ) user_input = input("You: ") response_generator = llama2_wrapper.run(user_input,...

ibutenko

GPU CUDA not found And HFValidationError

Hey There, I am new to this so please consider that while writing you response. So I read the readme and followed it... I didn't want to download the model...

HorrorBest

multi gpu, llama2-70b

multi gpu support, llama2-70b fix, download revision

fo40225

dom.js:238 Uncaught (in promise) DOMException

Hello, when I'm using gradio framework, the chatbot text occasionally got stuck after I submit some inputs. The debug info given by the Chrome is as follows, looking like a...

kisseternity

How to add llama_index in llama-webui

Hi, I'm trying to add the llama_index to a llama-2 model using llama-webui, but I'm not sure how to do it. I've read the documentation, but it doesn't seem to...

Kashif-Inam

Unable to load 70B llama2 on cpu (llama cpp)

1

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 The exact same settings and quantization works for 7B and 13B. Here is...

Dougie777

llama2-webui
llama2-webui copied to clipboard

Metadata

Supports accepting network requests, listening on specific ports and running GPTQ models on multiple GPUs

Gradio Memory Leak Issue

Very slow generation

why i7 8700 is faster than i7 9700

The temperature parameter does not seem to work

GPU CUDA not found And HFValidationError

multi gpu, llama2-70b

dom.js:238 Uncaught (in promise) DOMException

How to add llama_index in llama-webui

Unable to load 70B llama2 on cpu (llama cpp)

← Metadata

Owner

Metadata

llama2-webui llama2-webui copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama2-webui
llama2-webui copied to clipboard