lmql icon indicating copy to clipboard operation
lmql copied to clipboard

Starting the playground with a self-hosted model

Open KamilLegault opened this issue 2 years ago • 2 comments

Is there any documentation on how to have the playground connect to a locally hosted model (llama.cpp) ? I have not been able to figure out how to do it.

KamilLegault avatar Nov 20 '23 02:11 KamilLegault

Hi @KamilLegault,

you can have a look here: https://lmql.ai/docs/models/llama.cpp.html#model-server.

You can start a LMTP inference endpoint by running

lmql serve-model llama.cpp:/YOUR_PATH/YOUR_MODEL.gguf

In the playground, you then need to specify which model to use, e.g.:

argmax 
    "What is the capital of France? [RESPONSE]"
from 
    lmql.model("llama.cpp:/YOUR_PATH/YOUR_MODEL.gguf")
where 
    len(TOKENS(RESPONSE)) < 20

Without local: in front of llama.cpp:, the playground will look for that exact model running within the inference endpoint, as stated in the documentation.

Hope that helps :)

Best Leon

reuank avatar Nov 20 '23 19:11 reuank

In addition to what @reuank said, you can also specify the default model for the playground on launch.

For instance:

LMQL_DEFAULT_MODEL='local:gpt2' lmql playground

Like, queries without from clause will also use local:gpt2 by default.

lbeurerkellner avatar Nov 22 '23 16:11 lbeurerkellner