exllama icon indicating copy to clipboard operation
exllama copied to clipboard

Request for server API script without sessions

Open CORRUPTOR2037 opened this issue 2 years ago • 5 comments

So I could just send a simple request and get simple response in a free-form mode, without any additional context

CORRUPTOR2037 avatar Jun 20 '23 19:06 CORRUPTOR2037

This is pretty much what the example_flask.py script does. Is that what you're after?

turboderp avatar Jun 20 '23 20:06 turboderp

Oh god really, it is, I'm sorry But it will be even better if I could pass generation settings with inference post request or set them in a separate request. And provide path to model to script via arguments. Like a real API server. I know it's not a big deal, I can rewrite it myself, but it will be so much better for everyone if such script will be available from the box.

CORRUPTOR2037 avatar Jun 20 '23 20:06 CORRUPTOR2037

I actually thought about adding that. But I was torn because I also liked the example being really simple. I guess it would be quick enough to add a basic API server, though. I'll look into it in a bit.

turboderp avatar Jun 20 '23 20:06 turboderp

Thanks. Jumped here from oobabooga's, I really like your minimal dependency approach in contrast with enourmous blob of hundred dependencies mixed with conda, keep it up!

CORRUPTOR2037 avatar Jun 20 '23 20:06 CORRUPTOR2037

I posted it in another issue at some point, but I have a script which sounds like what you want here: https://gist.github.com/BlankParenthesis/4f490630b6307ec441364ab64f3ce900

It mimics ooba's text-generation-webui API so it works with tavern (the main difference being the streaming API is the same port as the regular API).

I tried to keep it simple, but I ended up having to make some changes for it to work nicely. Some of those workarounds make the generate function a bit more complex than I'd like so perhaps there's a nice way to do it, but this seems to work at least. The settings mismatch is also a bit ugly: by following the existing format it excludes exllama specific settings and ignores others.

There's no way to set the model using this, but adding a route for that should be simple enough if you want.

BlankParenthesis avatar Jun 21 '23 03:06 BlankParenthesis