llamafile Add explanation for Windows user to how to Create EXE files

Discussed in https://github.com/Mozilla-Ocho/llamafile/discussions/418

^{Originally posted by fabiomatricardi May 15, 2024} Ciao, I tried to ask in the Discord channel but I get no replies... so after 1 week of struggles I understood how to do it. I would like this to be in the main page of the Repo

How to create .exe files in Windows

download a GGUF smaller than 4Gb (in my example qwen1_5-0_5b-chat-q8_0.gguf, from official Qwen repo: it has already chat template and tokenizer included in the GGUF)
download the zip file for llamafile latest release here and unzip in the same folder of the GGUF
rename the extension to .exe
download zipalign from here and unzip it in the same folder
rename the extension to .exe

In my case i want the executable to run the API server with few more arguments (context length)

Create a .arg file as explained in Creating Llamafiles

the file will contain

-m
qwen1_5-0_5b-chat-q8_0.gguf
--host
0.0.0.0
-c
12000
...

in the terminal run the following to have the base binary

copy .\llamafile-0.8.4.exe qwen1_5-0_5b-chat-q8.llamafile

Then club together with zipalign the llamafile, the GGUF file and the arguments

.\zipalign.exe -j0 qwen1_5-0_5b-chat-q8.llamafile qwen1_5-0_5b-chat-q8_0.gguf .args

Finally rename the .llamafile into .exe

ren qwen1_5-0_5b-chat-q8.llamafile qwen1_5-0_5b-chat-q8.exe

Run the Qwen model

from the terminal run

.\qwen1_5-0_5b-chat-q8.exe --nobrowser

This will load the model and start the webserver without opening the browser.

Python API call

from openai import OpenAI
import sys

# Point to the local server
client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
history = [
    {"role": "system", "content": "You are QWEN05, an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful. Always reply in the language of the instructions."},
    {"role": "user", "content": "Hello, introduce yourself to someone opening this program for the first time. Be concise."},
]
print("\033[92;1m")
while True:
    userinput = ""
    completion = client.chat.completions.create(
        model="local-model", # this field is currently unused
        messages=history,
        temperature=0.3,
        frequency_penalty  = 1.4,
        max_tokens = 600,
        stream=True,
    )

    new_message = {"role": "assistant", "content": ""}
    
    for chunk in completion:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
            new_message["content"] += chunk.choices[0].delta.content

    history.append(new_message)

    print("\033[1;30m")  #dark grey
    print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:")
    print("\033[91;1m")  #red
    lines = sys.stdin.readlines()
    for line in lines:
        userinput += line + "\n"
    if "quit!" in lines[0].lower():
        print("\033[0mBYE BYE!")
        break
    history = [
            {"role": "system", "content": "You are an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful."},
            ]
    history.append({"role": "user", "content": userinput})
    print("\033[92;1m")

Accepting multi line entries in the input: when finished Ctrl+Z and Enter

To exit type quit! and Ctrl+Z and Enter

May 15 '24 09:05 fabiomatricardi

Are you able to make a PR proposal with your proposed changes so it can be reviewed and potentially merged in?

May 21 '24 16:05 mofosyne

sure... how?

May 23 '24 09:05 fabiomatricardi

@jart is this actually more suitable for a wiki instead? If so then could you enable it? This might be more of a freeform doc than a formal instruction.

Otherwise, would it make more sense for him to make a new folder /docs/ in the repo?

@fabiomatricardi how experience are you to making github contributions? If not very experienced then we could try to accommodate, but do recommend you learn a bit how to use github Pull Request so your contributions can be more easily tracked.

May 23 '24 10:05 mofosyne

Hi Brian, I have never done it before, but I can try. I only have maintained my own repositories. Just let me know if I have to go by creating a new folder like llamafile/docs/windowswiki or a wiki instead. In case I have to go for a pull request let me know if I have to do it on a fork, or you will give me write permissions to the repo

waiting to hear from you

Matricardi Fabio

On Thu, May 23, 2024 at 6:29 PM Brian @.***> wrote:

@jart https://github.com/jart is this actually more suitable for a wiki instead? If so then could you enable it? This might be more of a freeform doc than a formal instruction.

Otherwise, would it make more sense for him to make a new folder /docs/ in the repo?

@fabiomatricardi https://github.com/fabiomatricardi how experience are you to making github contributions? If not very experienced then we could try to accommodate, but do recommend you learn a bit how to use github Pull Request so your contributions can be more easily tracked.

— Reply to this email directly, view it on GitHub https://github.com/Mozilla-Ocho/llamafile/issues/419#issuecomment-2126766778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFPXVL3O5T4OB35FQLN3O6TZDXAILAVCNFSM6AAAAABHX2FOZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWG43DMNZXHA . You are receiving this because you were mentioned.Message ID: @.***>

May 23 '24 12:05 fabiomatricardi

copy .\llamafile-0.8.4.exe qwen1_5-0_5b-chat-q8.llamafile

Make qwen2_5-7b-chat.llamafile

Dec 19 '24 07:12 bphd

llamafile llamafile copied to clipboard

Add explanation for Windows user to how to Create EXE files

Discussed in https://github.com/Mozilla-Ocho/llamafile/discussions/418

How to create .exe files in Windows

Run the Qwen model

Python API call

llamafile
llamafile copied to clipboard