llamafile icon indicating copy to clipboard operation
llamafile copied to clipboard

Add explanation for Windows user to how to Create EXE files

Open fabiomatricardi opened this issue 1 year ago • 6 comments

Discussed in https://github.com/Mozilla-Ocho/llamafile/discussions/418

Originally posted by fabiomatricardi May 15, 2024 Ciao, I tried to ask in the Discord channel but I get no replies... so after 1 week of struggles I understood how to do it. I would like this to be in the main page of the Repo

How to create .exe files in Windows

  • download a GGUF smaller than 4Gb (in my example qwen1_5-0_5b-chat-q8_0.gguf, from official Qwen repo: it has already chat template and tokenizer included in the GGUF)
  • download the zip file for llamafile latest release here and unzip in the same folder of the GGUF
  • rename the extension to .exe

  • download zipalign from here and unzip it in the same folder
  • rename the extension to .exe

In my case i want the executable to run the API server with few more arguments (context length)

Create a .arg file as explained in Creating Llamafiles

the file will contain

-m
qwen1_5-0_5b-chat-q8_0.gguf
--host
0.0.0.0
-c
12000
...

in the terminal run the following to have the base binary

copy .\llamafile-0.8.4.exe qwen1_5-0_5b-chat-q8.llamafile

Then club together with zipalign the llamafile, the GGUF file and the arguments

.\zipalign.exe -j0 qwen1_5-0_5b-chat-q8.llamafile qwen1_5-0_5b-chat-q8_0.gguf .args

Finally rename the .llamafile into .exe

ren qwen1_5-0_5b-chat-q8.llamafile qwen1_5-0_5b-chat-q8.exe

Run the Qwen model

from the terminal run

.\qwen1_5-0_5b-chat-q8.exe --nobrowser

This will load the model and start the webserver without opening the browser.

Python API call

from openai import OpenAI
import sys

# Point to the local server
client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
history = [
    {"role": "system", "content": "You are QWEN05, an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful. Always reply in the language of the instructions."},
    {"role": "user", "content": "Hello, introduce yourself to someone opening this program for the first time. Be concise."},
]
print("\033[92;1m")
while True:
    userinput = ""
    completion = client.chat.completions.create(
        model="local-model", # this field is currently unused
        messages=history,
        temperature=0.3,
        frequency_penalty  = 1.4,
        max_tokens = 600,
        stream=True,
    )

    new_message = {"role": "assistant", "content": ""}
    
    for chunk in completion:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
            new_message["content"] += chunk.choices[0].delta.content

    history.append(new_message)

    print("\033[1;30m")  #dark grey
    print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:")
    print("\033[91;1m")  #red
    lines = sys.stdin.readlines()
    for line in lines:
        userinput += line + "\n"
    if "quit!" in lines[0].lower():
        print("\033[0mBYE BYE!")
        break
    history = [
            {"role": "system", "content": "You are an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful."},
            ]
    history.append({"role": "user", "content": userinput})
    print("\033[92;1m")

Accepting multi line entries in the input: when finished Ctrl+Z and Enter

To exit type quit! and Ctrl+Z and Enter

fabiomatricardi avatar May 15 '24 09:05 fabiomatricardi

Are you able to make a PR proposal with your proposed changes so it can be reviewed and potentially merged in?

mofosyne avatar May 21 '24 16:05 mofosyne

sure... how?

fabiomatricardi avatar May 23 '24 09:05 fabiomatricardi

@jart is this actually more suitable for a wiki instead? If so then could you enable it? This might be more of a freeform doc than a formal instruction.

Otherwise, would it make more sense for him to make a new folder /docs/ in the repo?

@fabiomatricardi how experience are you to making github contributions? If not very experienced then we could try to accommodate, but do recommend you learn a bit how to use github Pull Request so your contributions can be more easily tracked.

mofosyne avatar May 23 '24 10:05 mofosyne

Hi Brian, I have never done it before, but I can try. I only have maintained my own repositories. Just let me know if I have to go by creating a new folder like llamafile/docs/windowswiki or a wiki instead. In case I have to go for a pull request let me know if I have to do it on a fork, or you will give me write permissions to the repo

waiting to hear from you

Matricardi Fabio

On Thu, May 23, 2024 at 6:29 PM Brian @.***> wrote:

@jart https://github.com/jart is this actually more suitable for a wiki instead? If so then could you enable it? This might be more of a freeform doc than a formal instruction.

Otherwise, would it make more sense for him to make a new folder /docs/ in the repo?

@fabiomatricardi https://github.com/fabiomatricardi how experience are you to making github contributions? If not very experienced then we could try to accommodate, but do recommend you learn a bit how to use github Pull Request so your contributions can be more easily tracked.

— Reply to this email directly, view it on GitHub https://github.com/Mozilla-Ocho/llamafile/issues/419#issuecomment-2126766778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFPXVL3O5T4OB35FQLN3O6TZDXAILAVCNFSM6AAAAABHX2FOZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWG43DMNZXHA . You are receiving this because you were mentioned.Message ID: @.***>

fabiomatricardi avatar May 23 '24 12:05 fabiomatricardi

copy .\llamafile-0.8.4.exe qwen1_5-0_5b-chat-q8.llamafile

Make qwen2_5-7b-chat.llamafile

bphd avatar Dec 19 '24 07:12 bphd