lmql
lmql copied to clipboard
Starting the playground with a self-hosted model
Is there any documentation on how to have the playground connect to a locally hosted model (llama.cpp) ? I have not been able to figure out how to do it.
Hi @KamilLegault,
you can have a look here: https://lmql.ai/docs/models/llama.cpp.html#model-server.
You can start a LMTP inference endpoint by running
lmql serve-model llama.cpp:/YOUR_PATH/YOUR_MODEL.gguf
In the playground, you then need to specify which model to use, e.g.:
argmax
"What is the capital of France? [RESPONSE]"
from
lmql.model("llama.cpp:/YOUR_PATH/YOUR_MODEL.gguf")
where
len(TOKENS(RESPONSE)) < 20
Without local: in front of llama.cpp:, the playground will look for that exact model running within the inference endpoint, as stated in the documentation.
Hope that helps :)
Best Leon
In addition to what @reuank said, you can also specify the default model for the playground on launch.
For instance:
LMQL_DEFAULT_MODEL='local:gpt2' lmql playground
Like, queries without from clause will also use local:gpt2 by default.