Luca Beurer-Kellner
Luca Beurer-Kellner
If you want to know how a standalone playground distribution (for in-browser use) can be packaged up, have a look at https://github.com/eth-sri/lmql/blob/main/web/deploy.sh. If you don't set REACT_APP_WEB_BUILD=1, then the resulting...
Wow, thanks for getting right into it. I won't have much time to look into it over the weekend, but I will answer more concretely next week. Happy Easter to...
Thanks for the work. Can you comment on how FastAPI compares to e.g. gRPC with respect to throughput and latency. We are currently planning to optimise the LMQL Inference API...
With the updated inference infrastructure, the API has been replaced by a socket-based custom protocol, i.e. LMTP: https://github.com/eth-sri/lmql/tree/main/src/lmql/models/lmtp
Hi Chris, thanks for raising this, I suspect (2) is the issue here. Can you point me to the resources that allow one to get your version of llama/hf_llama7b. We...
Interestingly I can't get a LLamaTokenizer to work on my machine, e.g. this code never finishes executing and depending on environment sometimes tokenizer.bos_token_id spirals into a recursion loop in HF...
Thanks for the instructions, I will have a look soon. `lmql.model.serve.TokenizerProcessor` is outdated and should be removed, so there is no need to refactor it. I think the inference server...
The underlying issue of this bug report has been fixed in the latest version, together with our addition of llama.cpp as model inference backend. If you want to use llama...
I think there may be an issue with worker threads being shut down after the first completion. I will investigate a bit and report here. For now, usually when running...
I see. For this you may want to have a look at how we bridge async/non-async for langchain integration, see https://github.com/eth-sri/lmql/blob/main/src/lmql/runtime/langchain.py Bascially, if possible try re-using the event loop. Still,...