Luca Beurer-Kellner
Luca Beurer-Kellner
Hi there. There are several knobs to turn and play with to get better throughput. The performance you are seeing definitely can be improved by a *lot*: * The `chunksize=`...
I typically run `serve-model` remotely with `ssh -L 8080:localhost:8080` which should do the trick. Can you share more information about your SSH setup. Are there possibly limitations on the type...
Thanks a lot for the repo, I will definitely have a closer look. It would be awesome to do a bit of performance engineering/profiling work here, as a case study,...
Coming back to this, on a second look it appears that you are directly visiting localhost:8001. Could you clarify how you are trying to make use of the remote model....
I see, yes. The problem here is that we currently do not support a `from` clause when you omit the decoder keyword. So instead, just write: ``` argmax "Hello[WHO]" from...
Hi there, I haven't found time to get deeper into the repo, sorry about that. Still, it's great to hear that things are moving forward. Thanks a lot for the...
PRs are very welcome :) I think it definitely makes sense, however, you have to handle the encoder/decoder separation of the input somehow.
Thanks for your continued investigation. An update on my end: I now have a setup with the `debug` branch with Pythia and am doing test runs on my machine. I...
Thanks for raising this. The constraint you specified according to LMQL semantics means that your variable value will be at least 100 tokens long, and, once this limit is reached,...
Thanks for reporting. Marking this as a good first issue. The fix is likely somewhere close to https://github.com/eth-sri/lmql/blob/main/src/lmql/language/compiler.py#L428, where we compile LLM query strings into multi-line strings in the compiled...