lmql
lmql copied to clipboard
Use typing in server/client
TL;DR
As for now the server code is pretty hard to comprehend.
I propose to add typing, I can rewrite server using FastAPI - in addition to specifying request/response schema it runs a /docs endpoint that contains examples so that there is no need to use Postman/CURL. Also it would be easier to maintain because when someone will use incompatible version than this person will get a ValidationError instead of some other runtime error in some other place.
Example where it would help - adding a new model interface
I've started working on adding RWKV integration, but I've run into the following problems
- both server and client code need to be changed, I think I've updated server succesfully but on client side it doesn't work
- RWKV uses tokenizers.Tokenizer which uses different interface than actual instantiated classes in transformers
- I have problems with dc lib (actually what's up with that? Is this dclib a kind of abstraction over different models?)
@lbeurerkellner could you point me in the right direction?
Please let me know what you think.
Anyway, happy Easter!
Wow, thanks for getting right into it. I won't have much time to look into it over the weekend, but I will answer more concretely next week. Happy Easter to you as well.
dclib is an array-based library for implementing decoding algorithms independently from model backends and e.g. control flow logic. More on this, a bit more publicly, soon.
I've written synchronous version (I had problems with making the queues work) that can be used to illustrate the point.
Thanks for the work. Can you comment on how FastAPI compares to e.g. gRPC with respect to throughput and latency. We are currently planning to optimise the LMQL Inference API or even switch to an established solution altogether.
With the updated inference infrastructure, the API has been replaced by a socket-based custom protocol, i.e. LMTP: https://github.com/eth-sri/lmql/tree/main/src/lmql/models/lmtp