Luca Beurer-Kellner comments

Results 149 comments of


                                            Luca Beurer-Kellner

Running non-reloading hosted version of playground

If you want to know how a standalone playground distribution (for in-browser use) can be packaged up, have a look at https://github.com/eth-sri/lmql/blob/main/web/deploy.sh. If you don't set REACT_APP_WEB_BUILD=1, then the resulting...

Use typing in server/client

Wow, thanks for getting right into it. I won't have much time to look into it over the weekend, but I will answer more concretely next week. Happy Easter to...

Use typing in server/client

Thanks for the work. Can you comment on how FastAPI compares to e.g. gRPC with respect to throughput and latency. We are currently planning to optimise the LMQL Inference API...

Use typing in server/client

With the updated inference infrastructure, the API has been replaced by a socket-based custom protocol, i.e. LMTP: https://github.com/eth-sri/lmql/tree/main/src/lmql/models/lmtp

HF Llama Indexing Issue

Hi Chris, thanks for raising this, I suspect (2) is the issue here. Can you point me to the resources that allow one to get your version of llama/hf_llama7b. We...

HF Llama Indexing Issue

Interestingly I can't get a LLamaTokenizer to work on my machine, e.g. this code never finishes executing and depending on environment sometimes tokenizer.bos_token_id spirals into a recursion loop in HF...

HF Llama Indexing Issue

Thanks for the instructions, I will have a look soon. `lmql.model.serve.TokenizerProcessor` is outdated and should be removed, so there is no need to refactor it. I think the inference server...

HF Llama Indexing Issue

The underlying issue of this bug report has been fixed in the latest version, together with our addition of llama.cpp as model inference backend. If you want to use llama...

Running asyncio.get_event_loop().run_until_complete(query(args)) more than once deadlocks program

I think there may be an issue with worker threads being shut down after the first completion. I will investigate a bit and report here. For now, usually when running...

Running asyncio.get_event_loop().run_until_complete(query(args)) more than once deadlocks program

I see. For this you may want to have a look at how we bridge async/non-async for langchain integration, see https://github.com/eth-sri/lmql/blob/main/src/lmql/runtime/langchain.py Bascially, if possible try re-using the event loop. Still,...