llama2-webui
llama2-webui copied to clipboard
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
If multiple GPUs are used to run the GPTQ model, memory would only be allocated on the first GPU, resulting in an error due to the inability to allocate more...
Hi, I experienced a memory leak issue that could probably be connected to Gradio and to the issue discussed here: https://github.com/gradio-app/gradio/issues/3321 In the last messages they write that the issue...
I am running this on Mac M1 16GB RAM using `app.py` for simple text generation. Using the `llama.cpp` from terminal is much faster but when I use the backend through...
llama-2-7b-chat.ggmlv3.q4_0 | 4 bit | Intel i7-8700 | 5.4 GB RAM | 6.27 | 173.15 -- | -- | -- | -- | -- | -- llama-2-7b-chat.ggmlv3.q4_0 | 4 bit...
Hi! Code sample first: ``` from llama2_wrapper import LLAMA2_WRAPPER, get_prompt from IPython.display import display, Markdown chat_history = [] llama2_wrapper = LLAMA2_WRAPPER( backend_type="gptq", ) user_input = input("You: ") response_generator = llama2_wrapper.run(user_input,...
Hey There, I am new to this so please consider that while writing you response. So I read the readme and followed it... I didn't want to download the model...
multi gpu support, llama2-70b fix, download revision
Hello, when I'm using gradio framework, the chatbot text occasionally got stuck after I submit some inputs. The debug info given by the Chrome is as follows, looking like a...
Hi, I'm trying to add the llama_index to a llama-2 model using llama-webui, but I'm not sure how to do it. I've read the documentation, but it doesn't seem to...
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 The exact same settings and quantization works for 7B and 13B. Here is...