Justine Tunney comments

Results 655 comments of


                                            Justine Tunney

Feature Request: Can the Llamafile server be ready prior to model warming?

Here's 453ms latency to generate a single token using a 14 gig model: ``` main jart@luna:~/llamafile$ rusage o//llama.cpp/main/main -m /weights/Mistral-7B-Instruct-v0.3.BF16.gguf --cli -n 1 --log-disable --temp 0 --special Question took 449,432µs...

Feature Request: Can the Llamafile server be ready prior to model warming?

83 seconds to load a 3.6gb file. Do you have a 5400 rpm disk connected over gigabit ethernet? You're going to pay that cost no matter what you do. If...

Feature Request: Can the Llamafile server be ready prior to model warming?

I don't know what provisioned concurrency is. But I'd assume that warmup removed, you would have some other system send the warmup request automatically, and then you'd block any user...

Feature Request: Can the Llamafile server be ready prior to model warming?

You have the opportunity to be the first person to productionize the brand new llamafile server v.2.0 that I'm working on. So far it has an `/embedding` endpoint. Embedding models...

Feature Request: Can the Llamafile server be ready prior to model warming?

Oh, there's also a tokenization endpoint: ``` jtunn@gothbox:~$ curl http://127.0.0.1:8080/tokenize?prompt=hello+world { "add_special": true, "parse_special": false, "tokens": [ "[CLS]", " hello", " world", "[SEP]" ] } ```

Feature Request: Can the Llamafile server be ready prior to model warming?

`o//llamafile/server/main` has to be built from source. In the future, it'll be called `llamafile --server`. But right now it's a separate binary that's independent of our releases. The current `llamafile...

Feature Request: Can the Llamafile server be ready prior to model warming?

Sure I can do that. I'll just add it as a flag, as you suggested, since I think doing the warmup is good in general.

Feature Request: Can the Llamafile server be ready prior to model warming?

Thanks for your patience. I've added the warmup flag. Let me know if there's any issues with it. As for the warmup endpoint, could you just try that by sending...

Feature Request: Can the Llamafile server be ready prior to model warming?

Also this change just got shipped in our 0.8.12 release. Enjoy!

Feature Request: Automate update of upstream llama.cpp

I don't think automating sync is realistically going to happen. Upstream made a lot of changes we can't agree to, such as CUDA code size being too large, server having...