llamafile
llamafile copied to clipboard
cannot run cli mode without having to specify the name of the internal weights `-m`
Was creating https://huggingface.co/mofosyne/TinyLLama-v0-llamafile to port over Maykeye/TinyLLama-v0 text generation as it be a good small model for people to quickly download to test at least code generation. However seems like while the webserver works ./TinyLLama-v0-5M-F16.llamafile
... cli does not work as expected
-
./TinyLLama-v0-5M-F16.llamafile --cli -m TinyLLama-v0-5M-F16.gguf -p "A dog and a cat"
- Works as expected
-
./TinyLLama-v0-5M-F16.llamafile --cli -p "A dog and a cat
- Failed to open the internal model... seems like it assumes that the internal zipped model is located in
models/7B/ggml-model-f16.gguf
- Failed to open the internal model... seems like it assumes that the internal zipped model is located in
$ ./TinyLLama-v0-5M-F16.llamafile --cli -p "A dog and a cat"
Log start
main: llamafile version 0.5.0
main: seed = 1704906994
error: failed to open models/7B/ggml-model-f16.gguf: No such file or directory
By the way do we have any small llamafile in the readme we can show as example? If not then I would like to propose TinyLLama-v0-5M-F16.llamafile that I generated above. At least so people with slow internet can test it (once we fix this issue).
You can fix that by adding a .args
file to your llamafile.
cat >.args <<EOF
-m
TinyLLama-v0-5M-F16.gguf
EOF
zipalign -j0 TinyLLama-v0-5M-F16.llamafile .args
Let me know if that doesn't solve it, and I'll reopen.
Still encountering issue. Basically the problem doesn't appear if I run the llamafile without argument. It appears however if I have even one argument and it's like it forgets the .args
baked into it. Below is the steps (also in my repo readme) focusing on the baking of the gguf into the llamafile (nice tip with the .args copying).
... omitted for brevity
# Copy the generated gguf to this folder
cp maykeye_tinyllama/TinyLLama-v0-5M-F16.gguf TinyLLama-v0-5M-F16.gguf
# Get the llamafile engine
cp /usr/local/bin/llamafile TinyLLama-v0-5M-F16.llamafile
# Create an .args file with settings defaults
cat >.args <<EOF
-m
TinyLLama-v0-5M-F16.gguf
EOF
# Combine
zipalign -j0 \
TinyLLama-v0-5M-F16.llamafile \
TinyLLama-v0-5M-F16.gguf \
.args
# Test the new llamafile
# It should run thought all the .args settings
./TinyLLama-v0-5M-F16.llamafile
Hope that gives you a bit more context
Just in case, I also went into the llamafile repo and repulled to rule out any fixes already included. No change to this bug. It's still present as of now
# sync repo to mainline
git checkout main
git pull
# Rebuild and reinstall llamafile
make -j8
sudo make install PREFIX=/usr/local
Did a comparison between my .args vs a different llamafile. I noticed I forgot ...
When I tried the above similar command to mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile
it works. And this is it's arg content
$ unzip -p mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile .args
-m
mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf
-c
0
...
mine which didn't work is missing ...
$unzip -p TinyLLama-v0-5M-F16.llamafile .args
-m
TinyLLama-v0-5M-F16.gguf
Adding it in makes it work again. By the way where in llama.cpp is it looking for ".args", just wondering as I was not able to find it in the code when debugging it. My current assumption now that I know models/7B/ggml-model-f16.gguf
is the default assumed path if no model is known (as shown in model
string within struct gpt_params
in llama.cpp/common.h
), is that it completely ignores .args if ...
is missing. I also just noticed you mentioned in the readme for Creating llamafiles that
The ... argument optionally specifies where any additional CLI arguments passed by the user are to be inserted
In my opinion, if ...
is missing, then it should always assume that it's appended to the end of the .args file.
At the very least, is it possible to include comments in .args file e.g. #
? Might be worth to add the notice above ...
if we cannot implicitly add it in if missing.
Dev Note: For those wondering what .args
is being used by, it's basically part of the cosmopolitan library, where llama.cpp in llamafile calls LoadZipArgs()
. Key question i got for this function, is if it doesn't use the args if ...
is missing https://github.com/jart/cosmopolitan/blob/6715b670b1547aef161af183635de31bd3a0b8d7/tool/args/args.c#L129
Hmmm... looks like it's an explicit behavior of LoadZipArgs() to ignore any .args
setting if within that file ...
is missing if user specified CLI arguments is provided.
In my opinion this is a rather unexpected behavior. It really should be a bit more intelligent and only use .args as default value that can be overridden if required.
(May see if there is a way to do that)
Function: LoadZipArgs()
(cosmopolitan::tool/args/args.c)
Replaces the argument list with the contents of /zip/.args
, if it exists.
Usage
-
.args
file should contain one argument per line. - If
...
is not present in.args
, replacement occurs only if no CLI arguments are specified. - If
...
is present, it gets replaced with any user-specified CLI arguments.
Returns
-
0
on success. -
-1
if.args
is not found, without alteringerrno
.
Okay, I've given the idea a shot of making ...
optional in .args
. But it feels awefullly hackish... https://github.com/Mozilla-Ocho/llamafile/pull/204
I wonder if this could be done instead by getting cosmopolitan to add a new mode to LoadZipArgs()
to enable optional ...
in .args
file
After some discussion with Justine, will move this fix upstream to cosmopolitan https://github.com/jart/cosmopolitan/pull/1086
Changes now in cosmopolitan https://github.com/jart/cosmopolitan/pull/1086#event-11492979262 and will eventually be integrated into llamafile on the next toolchain revision.
This issue ticket should be considered solved and closed when that happen.
Okay was testing if this is solved by adjusting my build script in https://huggingface.co/mofosyne/TinyLLama-v0-5M-F16-llamafile/blob/main/llamafile-creation.sh so that the .arg is now
cat >.args <<EOF
-m
TinyLLama-v0-5M-F16.gguf
EOF
Upon running ./TinyLLama-v0-5M-F16.llamafile --cli -p "hello world the gruff man said"
I got the expected output.
This problem is now officially fixed in llamafile as of around llamafile v0.7.0 thanks @jart for mainlining the fix