llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Create server_streaming.py

Open zinccat opened this issue 1 year ago • 1 comments

This pull request adds the server_streaming.py example for the high-level API.

The server_streaming.py file is created under the examples/high_level_api/ directory. It contains an implementation of a server streaming example using the high-level API. The code sends messages through an HTTP POST request to "http://localhost:8000/v1/chat/completions" and processes the response. If the stream is set to non-streaming mode, the content of the response will be printed. Otherwise, the code performs streaming processing on the response and prints the parsed content.

This pull request aims to provide a practical example of server streaming using the high-level API, showcasing how to interact with an HTTP server and handle responses.

zinccat avatar Jun 22 '23 07:06 zinccat

What is the purpose of this example? Is there a way to get a real streaming? I mean, starting to show the user an output while the LLM behind the REST server is still processing?

AlessandroSpallina avatar Sep 02 '23 15:09 AlessandroSpallina