llamafile cannot run cli mode without having to specify the name of the internal weights `-m`

Was creating https://huggingface.co/mofosyne/TinyLLama-v0-llamafile to port over Maykeye/TinyLLama-v0 text generation as it be a good small model for people to quickly download to test at least code generation. However seems like while the webserver works ./TinyLLama-v0-5M-F16.llamafile... cli does not work as expected

./TinyLLama-v0-5M-F16.llamafile --cli -m TinyLLama-v0-5M-F16.gguf -p "A dog and a cat"
- Works as expected
./TinyLLama-v0-5M-F16.llamafile --cli -p "A dog and a cat
- Failed to open the internal model... seems like it assumes that the internal zipped model is located in models/7B/ggml-model-f16.gguf

$ ./TinyLLama-v0-5M-F16.llamafile --cli -p "A dog and a cat"
Log start
main: llamafile version 0.5.0
main: seed  = 1704906994
error: failed to open models/7B/ggml-model-f16.gguf: No such file or directory

Jan 10 '24 17:01 mofosyne

By the way do we have any small llamafile in the readme we can show as example? If not then I would like to propose TinyLLama-v0-5M-F16.llamafile that I generated above. At least so people with slow internet can test it (once we fix this issue).

Jan 10 '24 18:01 mofosyne

You can fix that by adding a .args file to your llamafile.

cat >.args <<EOF
-m
TinyLLama-v0-5M-F16.gguf
EOF
zipalign -j0 TinyLLama-v0-5M-F16.llamafile .args

Let me know if that doesn't solve it, and I'll reopen.

Jan 10 '24 21:01 jart

Still encountering issue. Basically the problem doesn't appear if I run the llamafile without argument. It appears however if I have even one argument and it's like it forgets the .args baked into it. Below is the steps (also in my repo readme) focusing on the baking of the gguf into the llamafile (nice tip with the .args copying).

... omitted for brevity

# Copy the generated gguf to this folder
cp maykeye_tinyllama/TinyLLama-v0-5M-F16.gguf TinyLLama-v0-5M-F16.gguf

# Get the llamafile engine
cp /usr/local/bin/llamafile TinyLLama-v0-5M-F16.llamafile

# Create an .args file with settings defaults
cat >.args <<EOF
-m
TinyLLama-v0-5M-F16.gguf
EOF

# Combine
zipalign -j0 \
  TinyLLama-v0-5M-F16.llamafile \
  TinyLLama-v0-5M-F16.gguf \
  .args

# Test the new llamafile
# It should run thought all the .args settings
./TinyLLama-v0-5M-F16.llamafile

Hope that gives you a bit more context

Jan 11 '24 00:01 mofosyne

Just in case, I also went into the llamafile repo and repulled to rule out any fixes already included. No change to this bug. It's still present as of now

# sync repo to mainline
git checkout main
git pull

# Rebuild and reinstall llamafile
make -j8
sudo make install PREFIX=/usr/local

Jan 11 '24 00:01 mofosyne

Did a comparison between my .args vs a different llamafile. I noticed I forgot ...

When I tried the above similar command to mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile it works. And this is it's arg content

$ unzip -p mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile .args
-m
mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf
-c
0
...

mine which didn't work is missing ...

$unzip -p TinyLLama-v0-5M-F16.llamafile .args
-m
TinyLLama-v0-5M-F16.gguf

Adding it in makes it work again. By the way where in llama.cpp is it looking for ".args", just wondering as I was not able to find it in the code when debugging it. My current assumption now that I know models/7B/ggml-model-f16.gguf is the default assumed path if no model is known (as shown in model string within struct gpt_params in llama.cpp/common.h), is that it completely ignores .args if ... is missing. I also just noticed you mentioned in the readme for Creating llamafiles that

The ... argument optionally specifies where any additional CLI arguments passed by the user are to be inserted

In my opinion, if ... is missing, then it should always assume that it's appended to the end of the .args file.

At the very least, is it possible to include comments in .args file e.g. #? Might be worth to add the notice above ... if we cannot implicitly add it in if missing.

Dev Note: For those wondering what .args is being used by, it's basically part of the cosmopolitan library, where llama.cpp in llamafile calls LoadZipArgs(). Key question i got for this function, is if it doesn't use the args if ... is missing https://github.com/jart/cosmopolitan/blob/6715b670b1547aef161af183635de31bd3a0b8d7/tool/args/args.c#L129

Jan 12 '24 12:01 mofosyne

Hmmm... looks like it's an explicit behavior of LoadZipArgs() to ignore any .args setting if within that file ... is missing if user specified CLI arguments is provided.

In my opinion this is a rather unexpected behavior. It really should be a bit more intelligent and only use .args as default value that can be overridden if required.

(May see if there is a way to do that)

Function: `LoadZipArgs()` (cosmopolitan::tool/args/args.c)

Replaces the argument list with the contents of /zip/.args, if it exists.

Usage

.args file should contain one argument per line.
If ... is not present in .args, replacement occurs only if no CLI arguments are specified.
If ... is present, it gets replaced with any user-specified CLI arguments.

Returns

0 on success.
-1 if .args is not found, without altering errno.

Jan 15 '24 14:01 mofosyne

Okay, I've given the idea a shot of making ... optional in .args. But it feels awefullly hackish... https://github.com/Mozilla-Ocho/llamafile/pull/204

I wonder if this could be done instead by getting cosmopolitan to add a new mode to LoadZipArgs() to enable optional ... in .args file

Jan 15 '24 16:01 mofosyne

After some discussion with Justine, will move this fix upstream to cosmopolitan https://github.com/jart/cosmopolitan/pull/1086

Jan 15 '24 16:01 mofosyne

Changes now in cosmopolitan https://github.com/jart/cosmopolitan/pull/1086#event-11492979262 and will eventually be integrated into llamafile on the next toolchain revision.

This issue ticket should be considered solved and closed when that happen.

Jan 16 '24 00:01 mofosyne

Okay was testing if this is solved by adjusting my build script in https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile/blob/main/llamafile-creation.sh so that the .arg is now

cat >.args <<EOF
-m
TinyLLama-v0-5M-F16.gguf
EOF

Upon running ./TinyLLama-v0-5M-F16.llamafile --cli -p "hello world the gruff man said" I got the expected output.

This problem is now officially fixed in llamafile as of around llamafile v0.7.0 thanks @jart for mainlining the fix

Apr 05 '24 09:04 mofosyne

llamafile llamafile copied to clipboard

cannot run cli mode without having to specify the name of the internal weights `-m`

Function: LoadZipArgs() (cosmopolitan::tool/args/args.c)

Usage

Returns

llamafile
llamafile copied to clipboard

Function: `LoadZipArgs()` (cosmopolitan::tool/args/args.c)