llamafile
llamafile copied to clipboard
Add explanation for Windows user to how to Create EXE files
Discussed in https://github.com/Mozilla-Ocho/llamafile/discussions/418
Originally posted by fabiomatricardi May 15, 2024 Ciao, I tried to ask in the Discord channel but I get no replies... so after 1 week of struggles I understood how to do it. I would like this to be in the main page of the Repo
How to create .exe files in Windows
- download a GGUF smaller than 4Gb (in my example qwen1_5-0_5b-chat-q8_0.gguf, from official Qwen repo: it has already chat template and tokenizer included in the GGUF)
- download the zip file for llamafile latest release here and unzip in the same folder of the GGUF
-
rename the extension to
.exe - download zipalign from here and unzip it in the same folder
-
rename the extension to
.exe
In my case i want the executable to run the API server with few more arguments (context length)
Create a .arg file as explained in Creating Llamafiles
the file will contain
-m
qwen1_5-0_5b-chat-q8_0.gguf
--host
0.0.0.0
-c
12000
...
in the terminal run the following to have the base binary
copy .\llamafile-0.8.4.exe qwen1_5-0_5b-chat-q8.llamafile
Then club together with zipalign the llamafile, the GGUF file and the arguments
.\zipalign.exe -j0 qwen1_5-0_5b-chat-q8.llamafile qwen1_5-0_5b-chat-q8_0.gguf .args
Finally rename the .llamafile into .exe
ren qwen1_5-0_5b-chat-q8.llamafile qwen1_5-0_5b-chat-q8.exe
Run the Qwen model
from the terminal run
.\qwen1_5-0_5b-chat-q8.exe --nobrowser
This will load the model and start the webserver without opening the browser.
Python API call
from openai import OpenAI
import sys
# Point to the local server
client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
history = [
{"role": "system", "content": "You are QWEN05, an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful. Always reply in the language of the instructions."},
{"role": "user", "content": "Hello, introduce yourself to someone opening this program for the first time. Be concise."},
]
print("\033[92;1m")
while True:
userinput = ""
completion = client.chat.completions.create(
model="local-model", # this field is currently unused
messages=history,
temperature=0.3,
frequency_penalty = 1.4,
max_tokens = 600,
stream=True,
)
new_message = {"role": "assistant", "content": ""}
for chunk in completion:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
new_message["content"] += chunk.choices[0].delta.content
history.append(new_message)
print("\033[1;30m") #dark grey
print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:")
print("\033[91;1m") #red
lines = sys.stdin.readlines()
for line in lines:
userinput += line + "\n"
if "quit!" in lines[0].lower():
print("\033[0mBYE BYE!")
break
history = [
{"role": "system", "content": "You are an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful."},
]
history.append({"role": "user", "content": userinput})
print("\033[92;1m")
Accepting multi line entries in the input: when finished Ctrl+Z and Enter
To exit type quit! and Ctrl+Z and Enter
Are you able to make a PR proposal with your proposed changes so it can be reviewed and potentially merged in?
sure... how?
@jart is this actually more suitable for a wiki instead? If so then could you enable it? This might be more of a freeform doc than a formal instruction.
Otherwise, would it make more sense for him to make a new folder /docs/ in the repo?
@fabiomatricardi how experience are you to making github contributions? If not very experienced then we could try to accommodate, but do recommend you learn a bit how to use github Pull Request so your contributions can be more easily tracked.
Hi Brian, I have never done it before, but I can try. I only have maintained my own repositories. Just let me know if I have to go by creating a new folder like llamafile/docs/windowswiki or a wiki instead. In case I have to go for a pull request let me know if I have to do it on a fork, or you will give me write permissions to the repo
waiting to hear from you
Matricardi Fabio
On Thu, May 23, 2024 at 6:29 PM Brian @.***> wrote:
@jart https://github.com/jart is this actually more suitable for a wiki instead? If so then could you enable it? This might be more of a freeform doc than a formal instruction.
Otherwise, would it make more sense for him to make a new folder /docs/ in the repo?
@fabiomatricardi https://github.com/fabiomatricardi how experience are you to making github contributions? If not very experienced then we could try to accommodate, but do recommend you learn a bit how to use github Pull Request so your contributions can be more easily tracked.
— Reply to this email directly, view it on GitHub https://github.com/Mozilla-Ocho/llamafile/issues/419#issuecomment-2126766778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFPXVL3O5T4OB35FQLN3O6TZDXAILAVCNFSM6AAAAABHX2FOZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWG43DMNZXHA . You are receiving this because you were mentioned.Message ID: @.***>
copy .\llamafile-0.8.4.exe qwen1_5-0_5b-chat-q8.llamafile
Make qwen2_5-7b-chat.llamafile