Jlama
Jlama copied to clipboard
streaming server support?
Is there a way to run and expose an API streaming server compatible with OpenAI API specifications?
Probably, here's the current API call for chat
https://github.com/tjake/Jlama/blob/main/jlama-cli/src/main/java/com/github/tjake/jlama/cli/serve/GenerateResource.java
I want this feature too
I am pretty sure that this would (at least) require Generator#generate to be enhanced with a callback that is called when the generation is complete.
You mean for stream=false?
For both :)
working PR here https://github.com/tjake/Jlama/pull/23