Luca Beurer-Kellner comments

Results 149 comments of


                                            Luca Beurer-Kellner

Speed and Optimisation of Inference

Hi there. There are several knobs to turn and play with to get better throughput. The performance you are seeing definitely can be improved by a *lot*: * The `chunksize=`...

SSH port forwarding to websocket application

I typically run `serve-model` remotely with `ssh -L 8080:localhost:8080` which should do the trick. Can you share more information about your SSH setup. Are there possibly limitations on the type...

Speed and Optimisation of Inference

Thanks a lot for the repo, I will definitely have a closer look. It would be awesome to do a bit of performance engineering/profiling work here, as a case study,...

SSH port forwarding to websocket application

Coming back to this, on a second look it appears that you are directly visiting localhost:8001. Could you clarify how you are trying to make use of the remote model....

SSH port forwarding to websocket application

I see, yes. The problem here is that we currently do not support a `from` clause when you omit the decoder keyword. So instead, just write: ``` argmax "Hello[WHO]" from...

Speed and Optimisation of Inference

Hi there, I haven't found time to get deeper into the repo, sorry about that. Still, it's great to hear that things are moving forward. Thanks a lot for the...

Add support for T5 family

PRs are very welcome :) I think it definitely makes sense, however, you have to handle the encoder/decoder separation of the input somehow.

Speed and Optimisation of Inference

Thanks for your continued investigation. An update on my end: I now have a setup with the `debug` branch with Pythia and am doing test runs on my machine. I...

`STOPS_AT(X, "\n")` broken when using `len(TOKENS(X)) > 100`

Thanks for raising this. The constraint you specified according to LMQL semantics means that your variable value will be at least 100 tokens long, and, once this limit is reached,...

Quotes aren't always parsed properly

Thanks for reporting. Marking this as a good first issue. The fix is likely somewhere close to https://github.com/eth-sri/lmql/blob/main/src/lmql/language/compiler.py#L428, where we compile LLM query strings into multi-line strings in the compiled...