lmql
lmql copied to clipboard
Using RayServe instead of lmql serve-model
Is it possible to serve model through Ray Serve Instead of lmql serve-model ? If so how to modify "from" clause to access the Ray Serve API endpoint ?
It looks like Ray Serve does offer a relatively flexible access to the model. Most importantly, for support, we need access to the next-token distribution and a way to shift it according to a logit bias. @charles-dyfis-net is working on Replicate support, there may be some parallels here.